Evolution from total variation to nonlinear sparsifying transform for sparse-view CT image reconstruction

Sparse-view CT has been widely studied as an effective strategy for reducing radiation dose to patients. However, the conventional image reconstruction algorithms, such as filtered back-projection method and classical algebraic reconstruction techniques, can no longer be competent in the image reconstruction task of sparse-view CT. A new principle, called compressed sensing (CS), has been therefore developed to provide an effective solution for the ill-posed inverse problem of sparse-view CT image reconstruction. Total variation (TV) minimization, which is most extensively studied among the existing CS techniques, has been recognized as a powerful tool for dealing with this difficult problem in image reconstruction community. However, in recent years, the drawbacks of TV are being increasingly reported, which are appearance of patchy artifacts, depict of incorrect object boundaries, and loss in image textures or smooth intensity changes. These degradations appear especially in reconstructing actual CT images with complicated intensity changes. In order to address these drawbacks, a series of advanced algorithms using nonlinear sparsifying transform (NLST) have been proposed very recently. The NLST-based CS is based on a different framework from the TV, and it achieves an improvement in image quality. Since it is a relatively newly proposed idea, within the scope of our knowledge, there exist few literatures that discusses comprehensively how the image quality improvement occurs in comparison with the conventional TV method. In this study, we investigated the image quality differences between the conventional TV minimization and the NLST-based CS, as well as image quality differences among different kinds of NLST-based CS algorithms in the sparse-view CT image reconstruction. More specifically, image reconstructions of actual CT images of different body parts were carried out to demonstrate the image quality differences.

been widely studied as an effective strategy for reducing radiation dose. Compared with 7 the existing CT systems where several hundred or over a thousand of projection views 8 are required per rotation, as shown in Fig. 1, the sparse-view CT has a potential to 9 reduce the number of projection views to 1/10 of the number employed in current 10 commercial CT scanners. Thanks to the reduction in projection views, patient radiation 11 dose is also significantly decreased. However, due to the insufficient sampling with 12 sparse-view measurements, neither conventional filtered back-projection (FBP) method 13 nor classical algebraic reconstruction technique (ART) method can achieve sufficient 14 diagnostic image quality any more. 15 Two old works in the mathematical literature demonstrated that the solution to the 16 image reconstruction from a small number of projection data is not unique [9,10]. 17 Therefore, some prior knowledge on the object needs to be employed to achieve 18 satisfactory reconstructions. Actually, many researchers have investigated this problem 19 with a variety of reconstruction methods and prior knowledge [11,12], but no one 20 succeeded in discovering an excellent method which can be used in clinical routine. The 21 situation has changed dramatically during 2000's thanks to the discovery of compressed 22 sensing (CS) theory. The CS was originally proposed by Donoho [13] and Candes et 23 al. [14], and now it has become the gold standard to solve a class of difficult inverse 24 problems including the sparse-view CT image reconstruction. The CS provides an 25 innovative framework to solve the underdetermined ill-posed inverse problems. 26 Generally, a cost function consisting of data fidelity term and penalty term called 27 regularization is designed. Then, an optimal solution is obtained by minimizing the cost 28 function based on convex optimization techniques. The data fidelity term of CS is 29 regarded as the fundamental part, and the least-squares error between estimated 30 projection data and actually measured projection data is commonly used. The 31 regularization plays an important role to compensate for missing information in the 32 incomplete sparse-view projection data, where different regularizations lead to different 33 solutions. 34 Total variation (TV) is attracting much attention as a kind of regularization used in 35 the CS, because it has achieved big success in producing high image quality in the task 36 of sparse-view CT reconstruction. The TV is defined as a measure to evaluate sum of 37 the intensity gradient at each pixel, and it was first proposed by Rudin,Osher and 38 Fatemi [15] in 1992. In early stages, the TV was used as a penalty term in iterative CT 39 image reconstruction algorithms [16][17][18][19], with significant success in smoothing out 40 statistical noise in the image while preserving sharp object boundaries. Subsequently, 41 Sidky et al. extended the application of TV to a constrained optimization approach, 42 where the TV was a cost function and data fidelity was a constraint [20,21]. In this 43 case, they found the solution minimizing the TV under the hard constraint 44 corresponding to the data fidelity. This extension made the TV approach to be more 45 robust in treating the underdetermined problem such as sparse-view and limited angle 46 reconstruction. Lately, a further modification of the TV, i.e. weighted-TV, has been 47 proposed in which they took the edge property of natural images into account [22,23].

48
This proposal has a potential to avoid the excessive smoothing boundaries occurred in 49 the TV to some extent. The TV is becoming the most standard approach of CS in CT 50 image reconstruction. In recent years, however, the drawbacks of TV, such as depict of 51 incorrect object boundaries, and loss in image textures or smooth intensity changes, are 52 being increasingly reported [24][25][26], which accelerates comprehensive understanding 53 about the limitations of TV. Additionally, a decade ago, Herman and Davidi claimed 54 that the TV favors no intensity changes in images and has limitations in exactly 55 reconstructing images with complicated structures [27]. Furthermore, they reasonably 56 demonstrated the mechanism of TV limitation. Therefore, a strong demand has been 57 raised towards developing a more powerful CS framework for the sparse-view CT image 58 The aim of CT image reconstruction is to recover an object from measured projection 91 data. When iterative methods are used for image reconstruction, the problem can be 92 formulated as solving a linear equation the measured projection data, x = (x 1 , x 2 , · · · , x J ) T represents the attenuation 94 coefficients of object to be reconstructed, and A = {a ij } is the I × J system matrix.

95
When the dimension of b is smaller than the dimension of x (i.e. I < J), the problem 96 becomes underdetermined so that reconstructing an accurate image becomes 97 challenging, which is the standard mathematical setup of sparse-view CT reconstruction. 98 In such situation, CS is usually utilized to obtain a feasible image, in which a cost 99 function consisting of a data fidelity term L( x) and a regularization term where L ( x) is defined by the least-squares error A x − b 2 , U ( x) is defined by Eq. (2) 102 below, and β is the hyperparameter to control the strength of regularization.
where W represents the sparsifying transform which converts x into a sparse vector, and 104 T denotes the dimension of sparsifying transformed coefficients vector W x. In Eq. (2), 105 the norm z 1 1 = T t=1 |z t | is called L1 norm which is known to have superior ability in 106 picking up sparse solutions [6,32]. The sparsifying transform W is usually designed as a 107 high-pass filter which extracts high frequency components of object (i.e. object 108 boundaries and textures). In this paper, for the purpose of this study, which is to 109 compare image quality of the TV and the NLST, we design W with two different forms, 110 which correspond to the TV and the NLST, respectively. The detailed formulas are 114 In Eq. (3), h T j x and v T j x are inner product representations of finite difference operations 115 around the j-th pixel along the horizontal and vertical directions, respectively. See , which is the L1 norm of the magnitude of intensity gradient for a 118 given image. In Eq. 4, N denotes an arbitrary nonlinear low-pass filter. It is known that 119 nonlinear filters can frequently show satisfactory performance in the task of image 120 processing such as denoising. In this study, we investigate the potential effectiveness of 121 nonlinear filters in the task of image reconstruction, to compare the result of the NLST 122 with that of the TV approach. We utilized median filter, bilateral filter and nonlocal 123 means (NLM) filter as the non-linear filter N to perform the sparsifying transform.

124
Mathematical tools 125 Once the cost function is determined, the crucial subsequent step is to derive an 126 iterative algorithm to find the optimal solution by minimizing the cost function. We 127 utilize techniques of convex optimization to achieve this task. Before going into the 128 concrete derivation, it is indispensable to introduce two mathematical tools together 129 with some related basic pieces of prior knowledge. Let us consider a convex 130 minimization problem formulated as where we assume that f ( x) is a possibly non-differentiable lower semi-continuous (lsc) 132 convex function.

133
[Proximity Operator and Proximal Algorithm] [33] The proximity (prox ) 134 operator corresponding to f ( x) is defined by where the parameter α is called the step-size parameter. There exist two convenient 136 properties in the prox operator, which facilitated us to use this operator in the design of 137 iterative algorithm. One property is that it can be defined for any lsc convex function 138 even if it is non-differentiable like the L1 norm. The other property is that its fixed 139 points, i.e.
x satisfying x = prox αf ( x), coincide with minimizers of f ( x) for any α > 0. 140 Furthermore, the prox operator is necessarily a non-expansive mapping. These 141 properties allow us to use the following iteration formula to find a minimizer of f ( x).
where k denotes the iteration number. This iterative algorithm is called the proximal 143 minimization algorithm, which provides a powerful framework for non-differentiable 144 convex minimizations.

145
[Proximal Gradient Method] Let us consider a convex minimization problem 146 expressed as where g ( x) is a smooth convex function and h ( x) is a possibly non-differentiable 148 convex function. We consider the situation where the prox operator prox αf ( z) 149 corresponding to f ( x) is intensive to compute, but prox αg ( z) and prox αh ( z) 150 corresponding to the sub-cost functions g ( x) and h ( x) can be easily computed. The 151 proximal gradient method is a framework for constructing an iterative algorithm to 152 minimize f ( x) under such situations [34]. Basically, the algorithm is constructed by ) by implementing 154 gradient descent processing to the smooth function g ( x). And then achieve the updated 155 vector x (k+1) by computing the prox operator prox αh ( c( x (k) )) corresponding to h( x).

156
The processing procedure can be summarized as where k denotes the iteration number, and α (k) is the step-size parameter dependent on 158 k. With respect to the convergence property of Eq. (9), Passty proved the following 159 theorem [35], where a more general situation of multiple, i.e. more than two, sub-cost 160 functions were considered. Then the following theorem holds.

161
Theorem We consider an ergodic average of the iterates x (k ) (k = 0, 1, 2, . . . , k) 162 defined by Then, x (k) converges to a minimizer of f ( x) when the diminishing step-size who 164 satisfies the following rules is used.
We note that the above convergence is called the ergodic convergence, in which the 166 ergodic average converges to a minimizer of f ( x) more easily than the sequence x (k) 167 itself. In practice, however, we have never observed a situation in which the ergodic 168 convergence occurs with the sequence x (k) itself being not convergent. Therefore, we 169 conjecture that the proposed algorithm expressed by Eq. (9) can be practically used 170 without being anxious about the non-convergence.

171
Iterative algorithm for total variation reconstruction 172 In this section, we explain how to construct the iterative algorithm for the TV 173 minimization by using the proximal gradient framework. The cost function of TV can 174 be expressed in detail by We first split the cost function in Eq. (12) into the sum of sub-cost functions as Then the gradient descent processing and prox operator can be applied to each 177 sub-cost function alternately, which leads to the two-step iterative algorithm as where k denotes the iterative number. The algorithm expressed by Eq. (14) where A T indicates the transpose matrix of system matrix A. The iteration formula in 184 Eq. (15) possesses a well-known structure which is the same as that in the steepest 185 descent method applied to the minimization of A x − b 2 .

186
Next, we deal with the computation of prox operator corresponding to the TV term 187 (i.e. solving the second equation of Eq. (14)). This minimization problem is the same as 188 that appearing in the so-called TV denoising (ROF model), so that we can use a 189 standard algorithm such as Chambolle's projection algorithm [15,36]. In our 190 implementation of image reconstruction, we used Chambolle's projection algorithm. We 191 give a detail explanation about this algorithm in Appendix A.

192
In order to guarantee the convergence of the proposed algorithm, we take a 193 commonly used way to set the step-size parameter α (k) as where α 0 > 0 and > 0 are pre-specified parameters, and k denotes the iteration 195 number.

196
So far, the iterative algorithm for the TV reconstruction can be summarized in 197 Algorithm 1.
Achieve the intermediate vector: Compute the prox operator for the TV term (See Appendix A) In this section, we explain how to construct the iterative algorithm for the NLST by 201 using the proximal gradient framework. Based on the derivation in Section 2.1, the cost 202 function of NLST can be expressed as where N denotes a kind of nonlinear low-pass filter. In this study, we utilize three  ) and (20) show the 213 specific weight computation method for the bilateral and the NLM filter, respectively. 214 Note that, to simplify the expressions in Eqs. Similarly to the derivation in Section 2.3, we split the cost function into the sum of 224 two sub-cost functions as Then, the gradient descent processing and prox operator can be applied to the 226 sub-cost functions alternately, and we can obtain the two-step iterative algorithm as where k denotes the iteration number. We observe that the first equation in Eq. (22) is 228 exactly the same as that in Eq. (14), thus we can use the same iteration formula as in 229 Eq. (15). Next, we explain the solution of the prox operator corresponding to the NLST 230 term (the second equation in Eq. (22)). It is a function consisting of an absolute value 231 term and a quadratic term. The minimization of this function can be approximately 232 performed in a closed form by using the so-called the soft-thresholding operation as 233 follows. First, we note that there exists an obstacle to apply the soft-thresholding 234 operation in this minimization, because the term N x is dependent on the variable x.

235
Here, we use a trick called "constant approximation". When the step-size α (k) is set to 236 a sufficiently small value , the unknown x approximates infinitely close to the 237 intermediate image c( x (k) ), and it is expected that N x is approximated well by N c( x (k) ), 238 which succeeds in converting the absolute value term into a separable form. Thanks to 239 this approximation, an iterative formula expressed by Eq. (23) can be obtained by 240 applying the soft-thresholding operation.
With respect to the control of step-size parameter α (k) , we use the same method as 242 in the TV approach, i.e. α (k) is decreased toward zero according to the diminishing 243 step-size rule in Eq. (16).

244
In addition, we make the following modification into the algorithm to further 245 improve the performance. For the NLST algorithm, some literatures have shown that, 246 when it is used in image reconstruction or image recovery [28,37], the NLSM can cause 247 isolated points in the image. The reason is explained as follows. The nonlinear filters, 248 such as the NLM and bilateral filters, determine the spatially-dependent degree of 249 smoothing based on the image characteristic at the current iteration. A strong 250 smoothing is applied when there exist many similar intensities in the neighboring pixels. 251 On the contrary, a weak smoothing is applied when similar intensities are few in the 252 neighboring pixels. Thanks to the spatially dependent nature of sparsifying transform, 253 the NLST strengthens its ability to improve image quality compared with the TV 254 approach. However, the filter is determined by using the intermediate images containing 255 heavy streak artifacts during iterations. The choice of inaccurate smoothing filter or fairly weak smoothing filter often causes to generate isolated points in the image. On 257 the other hand, the TV approach performs the same degree of smoothing by computing 258 the differences between adjacent pixels at every pixel location. As this procedure is not 259 spatially-dependent, the TV does not cause the isolated points. Therefore, in order to 260 prevent the NLST algorithm from suffering from the isolated points, we add the TV 261 regularization with a tiny weight working only on eliminating the isolated points to the 262 NLST regularization term.

263
Finally, we can summarize the iterative algorithm of NLST approach as in 264 Algorithm 2.
for all the pixels j in the image do 7: Experimental results

266
In this section, we investigated effectiveness of the conventional TV method and the 267 NLST method for the task of image reconstruction in sparse-view CT. In this study, we 268 carried out image acquisition from medical image library of The Japanese Society of 269 Medical Imaging Technology. The images in this library are provided for medical image 270 processing. The image data format was DICOM (Digital Imaging and Communications 271 in Medicine), but the patient content was removed in advance. Thus our study did not 272 involve human participants. Before the study began, we obtained the approval from  Fig. 4(a) is the 282 reconstructed image with 1000 iterations by the TV method. The used implementation 283 parameters are summarized as follows: α 0 =100, =100 and the hyperparameter β = 50. 284 Fig. 4(b-d) show reconstructed images by the NLST method. In this method, we set the 285 hyperparameter as β = 35, and the step-size control parameters were α 0 =100 and 286 =1000. Fig. 4(b) is the reconstructed image with 1000 iterations by the NLST method 287 using the median filter. We set the window size of median filter to 3×3 pixels. Fig. 4(c) 288 is the reconstructed image with 1000 iterations by the NLST method using the bilateral 289 filter. The implementation parameters to obtain this image were summarized as follows: 290 window size is 11×11 pixels, δ 1 =10000, and δ 2 =120. Fig. 4(d) is the reconstructed 291 image with 1000 iterations by the NLST method using the NLM filter. The parameters 292 of the NLM filter were set as follows: search window size is 7×7 pixels, patch window 293 size is 5×5 pixels, δ 1 =15, and δ 2 =15. In Fig. 4, the small parts of the images were 294 zoomed-in and displayed on the left or right side. Significant image degradations can be 295 obviously confirmed on the reconstructed image of Fig. 4(a) Fig. 4, 72 projection views were used for image reconstruction, and the 310 step-size control parameters were set to α 0 =100 and =100. Also, 1000 iterations were 311 carried out in this experiment. When a small hyperparameter value of β=1.0 was used, 312 the strength of TV was so light that the streak artifacts (the part pointed by the arrow) 313 could not be completely removed. On the contrary, when the too large hyperparameter 314 value of β=100.0 was used, the reconstructed image was forced to be close to be 315 piecewise constant, which excessively smoothed the image textures (the part pointed by 316 the arrow) and some of the textures were lost. As observed from this result, the TV 317 method has limitations when applied to this severely ill-posed inverse problem.

318
Considering that image characteristics of CT images differ significantly dependent 319 on parts of human body, we applied the described methods to reconstruct CT images 320 for four other body parts, which were dental area, cardiac area, liver area, and  Here, we give a detailed explanation on how to solve the minimization problem 372 expressed in Eq. (A.1). This minimization problem is the same as the so-called TV 373 denoising (ROF model), and the classical algorithm to solve this problem is Chambolle's 374 projection algorithm which we describe below. where the vector x denotes an unknown image with N × N , and the vector x denotes 376 an input image with N × N contaminated with noise. We use the symbol ∇ to 377 represent the intensity gradient operator, which is defined by where ∂ h x (i, j) and ∂ v x (i, j) denote the intensity gradient of horizontal and vertical 379 direction, respectively. We introduce two dual variables p 1 and p 2 , both of which are