Abstract
Glioma is a highly invasive type of brain tumor that appears in different parts of brain with various sizes, shapes, and blurred borders. Therefore, it is a challenging task to identify the exact boundaries of the tumor in an MR image. In recent years, deep learning based CNNs methods have gained popularity in the field of image processing and have been utilized for accurate image segmentation in medical applications. However, the inherent limitations of CNNs warrants the need for tens of thousands of images in the training phase, while the collection and annotation of such large number of images poses a great challenge. Here, for the first time, we have optimized a network based on the capsule neural network called SegCaps, to achieve accurate glioma segmentation in MR images. We have compared our results with a similar experiment conducted using commonly utilized U-Net. Both experiments are performed on the BraTS2020 challenging dataset. For U-Net, network training is performed on the entire dataset, while a subset containing only 20% of the whole dataset is used for the SegCaps. To evaluate the results of our proposed method, Dice Similarity Coefficient (DSC) is used. SegCaps and U-Net reached DSC of 87.96% and 85.56% on glioma tumor core segmentation, respectively. The SegCaps uses convolutional layers as the basic components and has the intrinsic capability to generalize novel viewpoints. The network learns the spatial relationship between features using dynamic routing of capsules. These capabilities of the capsule neural network have led to the 3% improvement of results in glioma segmentation with fewer data while it contains 95.4% less parameters than U-Net.
I. Introduction
Gliomas are the most common fatal brain tumors caused by the abnormal growth of glial cells in the brain [1, 2]. It is including different sub-regions, i.e., pre-tumoral edema, necrotic core, enhancing and non-enhancing tumor core. Among the various medical imaging techniques, Magnetic Resonance Imaging (MRI) has become popular for brain tumor diagnosis as an efficient and standard method due to providing good contrast for soft tissue [3, 4]. A clear appearance of the glioma sub-regions, can be found in four MRI sequences, including: T1-weighted (T1), T2-weighted (T2), T1-weighted with gadolinium contrast enhancement (T1-Gd) and Fluid attenuated Inversion Recovery (FLAIR) [5]. Gliomas can appear in any parts of brain and are heterogeneous in shape, size and appearance with blurred and irregular borders, making it very challenging to identify the exact boundaries in the image [6-8].
Brain segmentation is an essential step in medical processes such as non-invasive image guided brain surgery whereas image visualization and image registration outcome depend on the accurate segmentation results [9-11]. Accurate visualization of brain structures has an important role in many clinical applications such as cancer diagnosis [12], treatment verification [13], image-guided interventions [14]. Over time, different conventional brain image segmentation methods have been developed including manual segmentation, intensity-based methods [15-16], surface-based methods [17] and other deformable models. In the manual segmentation method, the segmentation process is performed by trained clinicians based on their skills. Therefore, it is expected that the results of image segmentation depend on the user’s subjective decision. Also, the time-consuming nature of manual segmentation is another limitation of these methods. However, manual segmentation is still necessary as a gold standard approach to evaluate other methods [5]. Despite many efforts that have been made to overcome the limitations of tumor segmentation algorithms, due to the highly heterogeneous nature of gliomas, conventional solutions are not satisfactory. On the other hand, artificial intelligence with tools recently provided by deep learning, has received a great deal of attention for many image processing applications such as image classification [18], image segmentation [19-24], and image reconstruction [25].
Convolutional Neural Networks (CNNs), as one of the most successful deep learning method, provides accurate image segmentation results [26]. In this method, the network is able to learn useful features automatically, without the need for manual feature selection. CNNs, have been mostly used in literature, with different approaches and have provided acceptable results in brain tumor segmentation [5, 27-31].
Despite of the recent significant achievements reported in the literature, there are four main shortcomings associated with the CNNs. The first limitation is that CNN cannot maintain the dependencies between the object parts and their totality, because of its structure design [32]. Second, the pooling layers use a type of routing which is not based on the human visual system. The pooling layer routs important information that are extracted from the image to all the neurons of the next layer completely, thus details or small objects of the image are missed. The third limitation is that, in the pooling layer, the information is routed statically from one layer to the next, and the next layer of neurons selected with no intuition. It is while, in the human visual system, this is done dynamically and neurons from the next layer can choose what information is of importance [33]. The fourth and main problem of CNNs is the required data for training. CNNs training requires tens of thousands of images, while preparing large datasets in medical applications is a very challenging task [34].
To overcome these limitations of CNNs, a novel class of neural networks was proposed by Sabour et al [35], where a group of neurons represent the existence of features. This group of neurons is called a “Capsule”, and the network made up of these blocks is called a “Capsule Neural Network” (also called CapsNets). The vertices on a CNN network are neurons, which have scalar representation of the output. Whereas in a CapsNet, the capsules are network vertices and their output appears as a vector, which is a richer representation of the output. The length of this vector indicates the possibility a special entity exists in the input image. In the CapsNet, an advanced technique is used to connect the capsules between layers, which obtains the network weights based on an iterative optimization strategy. In this routing technique, the output of the previous capsule is given as input to the next capsule and during an iterative process, the similarity between input and output of the capsule is compared. Finally, the previous capsules are routed to the capsule which has a similar output (content correlation) [35]. Recently, CapsNets have been used for various real-world applications, one of the most important applications is medical image segmentation [36].
To the best of our knowledge, this is the first time that CapsNet is optimized for glioma segmentation of MR images. In our proposed method, the networks is able to train with fewer dataset in comparison with commonly used CNNs. This network, by using “routing by agreement”, detects the relationship between parts of an object and the whole object by performing an iterative process.
II. METHOD AND MATERIAL
A. Dataset Description
To validate our algorithm accurately and compare it objectively with other available methods, a valid shared dataset is needed. Here we used the Brain Tumor Segmentation (BraTS) dataset for our experiments. The BraTS is a well-accepted benchmark developed for the automatic brain tumor segmentation in multimodal MRI scans of high-grade and low-grade gliomas [3]. BraTS2020 which is used in this experiment contains MRI scans of 369 patients with ground truth in four modalities (T1, T2, T1-Gd and FLAIR).
B. Capsule Network and routing by agreement algorithm
CapsNet is capable to identify spatial and hierarchical relationships between objects in the images. Capsule Network is composed of capsule layers. In each layer, several capsules are trained with an iterative algorithm. Each capsule is made up of a number of neurons and its output is a vector.
This vector has two general characteristics including length and orientation. The vector length represents the probability that the entity indicated by the capsule, is corresponding to current input. The orientation of the vector indicates state of an object. A simple capsule structure is shown in Fig. 1.
Here these weights are determined through routing by agreement iterative algorithm [35]. In this process, a temporary variable (bij) is created for updating the weights and it is set to zero at the beginning of a loop. In each iteration (r) for each capsule (i) in layer l, this variable will be updated. In Fig. 1, u0, u1,u2, …, ui represent the output vectors of previews layer capsules (layer l) that encode existence and state of features in low level objects. Prediction vectors (û j|i) were obtained by multiplying ûi by the corresponding weights. These weights show the relationship between the features obtained from lower-level capsules and higher-level capsules. Cij is scalar weighing of inputs, called “coupling coefficient” stored in Cij after passing the soft-max function. Initially, all routing weights are equal, which means the highest degree of uncertainty in routing the output vectors of the previous layer capsules. In the next step, for capsule j in layer l +1, the input vector is calculated as (1): where S j is the sum of weighted inputs. Next, for all capsules in layer l +1, to determine capsule output vector (Vj), S j is passed through a vector to vector function as below (2): This nonlinear function limits the length of vector, by squeezing it to the interval between zero and one which is consistent with the notion that output vector has probabilistic nature without changing its direction. Then, for all capsules in layer l and for all capsules in layer l +1, the temporary variable is updated as follows (3): In this equation, the correspondence between the current output and û j|i, is shown by dot product operation (Fig. 1). This routing algorithm calculates the output of capsule j during an iterative process.
In this section, the basic concepts of capsule network architecture and its routing algorithm were stated. In the next section, we will describe segmentation capsule network (SegCaps) architecture, which is an extension of the capsule network for segmentation tasks.
C. SegCaps architecture
Due to the complexity of the CapsNet, the network encounters runtime and memory constraints for image segmentation. For the first time, LaLonde et al [36] introduced a new architecture called SegCaps based on CapsNet to address segmentation problems. In SegCaps framework, routing of lower layer capsules to the next layers is done only in a specific spatial window and the transformation matrixes is also shared between the capsules of each layer. The routing algorithm in SegCaps differs from CapsNet proposed in [35] in details.
Vj in (2) is rewritten as Vx, y; the output of the capsule at spatial coordinates (x, y) as follows (4): where Px, y is the capsule input. The agreement is defined as dot product of output vector and corresponding prediction vectors (5): Also in this architecture, a deeper network was introduced than CapsNet, by developing the concept of de-convolutional capsules. Therefore, the dimensions of the input image will be increased to 512 × 512, in which case the segmentation task is possible.
III. Experiments and results
A. Experimental Configuration
We implemented the proposed networks with Keras and TensorFlow framework using NVIDIA GeForce RTX 1080 TI GPU. All trainings are performed using dice loss and Adam optimizer. The data used included MRI scans of 369 patients, with 70 slices used for each patient scan. Then 15% of this data is extracted randomly as a test dataset and the remaining 85% are used as training and validation set. We used same test subset of dataset to validate our results on SegCaps and U-Net. The image slices are cropped to 224× 224 and fed 4 channel images (including flair, T1, T1-gd and T2) to network with random augmentation and the batch size of one. Implementation details and specific setting for each network are as follows:
SegCaps: In this experiment, we randomly selected 20% of the total slices and trained the network in two steps. 80% of this subset is used for training and the remaining 20% used as the validation data. In the first phase, initially, we trained the network with a learning rate of 0.001, step decay of 1e-6 and reconstruction weight of 20. The weights of this phase model are used as an initial weight for the second phase of training, via a learning rate of 1e-7 and reconstruction weight of zero.
U-Net: We trained the U-Net network on 85% of the whole dataset as a training subset of the dataset. 80%of this subset is used as training and 20% as validation. The network is trained using a learning rate of 1e-7 and step decay of 1e-6 for 200 epochs.
C. Analysis of Experiments
To evaluate our proposed approach quantitatively, BraTS2020 was used. Quantitative results are presented by the performance measurement in the form of Dice Similarity Coefficient (DSC) for validation data by U-Net and SegCaps in Table1. DSC is calculated as (6) where P1 is segmented region of tumor and T1 is related area in the ground truth mask [3]: For qualitative evaluations, we have shown segmentation results using two approaches U-Net and SegCaps in comparison with Ground Truth, from three different patients with glioma tumor in multimodality MR images in Fig 2.
IV. Conclusion
To achieve accurate segmentation of glioma in MR images without the need for a huge number of training datasets, for the first time, we have optimized the SegCaps architecture introduced in [36]. The results of our proposed approach are compared to the results of U-Net as a commonly used network for medical image segmentation. At first, we tried to train the SegCaps using the entire dataset, however, with using different hyper-parameters setting, the DSC did not improve more than 0.7 (70%) and the network did not converge due to the complexity of the BraTS dataset. Also, the capsules complex architecture and routing algorithm, makes it very challenging to converge it for multiple classification and segmentation tasks. Then, we used a subset of the dataset and trained the network in two steps. Since the capsule network has the intrinsic capability to generalize novel viewpoints, SegCaps learn the spatial relationship between features using dynamic routing of capsules.
Therefore, it is expected the SegCaps network which is trained using a randomly selected subset of the dataset (the subset is selected to maintain the overall distribution of dataset space) has comparable results to the U-Net. The final quantitative results are shown in Table 1. By using the proposed two-step training method, our experimental results show that SegCaps has about 3% improvement compared to U-Net in DSC on validation data, while it uses fewer data for training and contains 95.4% less parameters than U-Net. It can be concluded that the main advantage of the SegCaps is overcoming the problem of data amount limitation, which is more common in medical datasets.
Also, this can be qualitatively observed in Figure 2. Three columns contain three patients MRI data, including T1, T1-Gd, Flair, and T2 sequences, in which the ground truth masks, U-Net, and SegCaps segmentation results are marked on the T2 images, respectively.
In Figure 2. which is selected from the test set, by comparing two network results with the ground truth, it is clear that the SegCaps has been successful in segmenting the enhancing tumor core area.
Despite SegCaps capabilities, its routing algorithm is much slower than backpropagation. Also, its computational complexity, in addition to being time consuming, will involve many GPU storage resources. However, since the SegCaps uses convolutional layers as a basic component, it has the potential to be optimized for complex segmentation tasks in challenging medical images, without data size limitation, to achieve excellent results.
Footnotes
This work was supported by the Faculty of Medicine, Tehran University of Medical Sciences under grant number 49513.