Deep-learning based retinal fluid segmentation in optical coherence tomography images using a cascade of ENets

Optical coherence tomography (OCT) is a non-invasive, painless and reproducible examination which allows ophthalmologists to visualize retinal layers. This imaging modality is useful to detect diseases such as diabetic macular edema (DME) or age related macular degeneration (AMD), which are associated with fluid accumulations. In this paper, a cascade of deep convolutional neural networks is proposed using ENets for the segmentation of fluid accumulations in OCT B-Scans. After denoising the B-Scans, a first ENet extracts the region of interest (ROI) between the inner limiting membrane (ILM) and the Bruch’s membrane (BM), whereas the second ENet segments the fluid in the ROI. A random forest classifier was applied on the segmented fluid regions to reject false positive. Our framework was trained on three different datasets with several diseases such as diabetic retinopathy (DR) and AMD. Our method achieves an average Dice Score for fluid segmentation of 0.80, 0.83 and 0.83 on the UMN DME, UMN AMD and Kermany datasets respectively.

Over the last decades, different approaches have been proposed to automatically 2 segment fluid in OCT images. As fluid accumulations are often considered as 3 biomarkers of retinal diseases, their accurate segmentation is crucial for the diagnosis of 4 different pathologies, as well as for the evaluation of the effectiveness of a treatment. 5 Three types of fluid accumulation can occur in the retina: intraretinal fluid (IRF), 6 subretinal fluid (SRF), and pigment epithelial detachment (PED). 7 8 Early papers on segmentation in OCT used methods based on thresholds and graph 9 theory. Wilkins et al. [1] proposed an algorithm based on intensity thresholds for fluid 10 segmentation in OCT images. The disadvantage of this approach is that it requires high 11 image quality, which is a well known challenge in the medical imaging field. Rashno et 12 al. [2] used the graph cut method to segment fluid accumulations. These classical image 13 processing algorithms have the inconvenience of a high computational time, which did on machine learning methods. Random forest classification [3], kernel regression [4] and 19 fuzzy level set [5] methods have been implemented to perform fluid segmentation. These 20 methods involve the training of a classifier by extracting a large number of textural, 21 structural or positional features from OCT images. Machine learning approaches have 22 allowed significant improvements in segmentation results over classical image processing. 23

24
In recent years, deep learning methods have been more specifically used in medical 25 image processing. The U-Net architecture presented by Ronnenberger et al. [6] has 26 provided a real breakthrough in biomedical image segmentation. Inspired by the Fully 27 Convolutional Network (FCN), U-Net combines deep semantic and spatial information 28 through encoding and decoding blocks linked by skip connections. This architecture has 29 achieved outstanding results in many medical image segmentation tasks and has been 30 used for OCT segmentation, whether for fluid region segmentation [7], retinal layer 31 segmentation [8], drusen segmentation [9], or intraretinal cystoid fluid segmentation [10]. 32 Lu et al. [7], winners of the Retinal OCT Fluid Challenge (RETOUCH), used a U-Net 33 to segment the three different types of fluid in B-Scans. Venhuizen et al. [10] proposed a 34 cascade of two U-Nets with one extracting the region of interest and the second 35 segmenting the fluid regions. Chen et al. [11]  In this paper, we present a novel approach to automatically segment and quantify fluid 50 in OCT B-Scans. Our pipeline is composed of a preprocessing step which uses the 51 BM3D algorithm to reduce the speckle noise. Then, a cascade of deep convolutional 52 neural networks using ENets is trained to extract the region of interest (ROI) and to 53 segment the fluid pixels in the ROI mask. To complete our segmentation pipeline, we 54 refine the results with a post-processing step by using a random forest classifier.

56
This article is organized as follows: the research materials and methods are described in 57 the second section, including the details of our pipeline implementation and its For the present study, three public OCT datasets were employed to train and test our 64 method. Two of them were created by the University of Minnesota (UMN) [2] [20]. The 65 OCT volumes were acquired with a Heidelberg Spectralis imaging system for 29 66 subjects with DME and 24 with AMD. Each B-Scan averages 12 to 19 frames with a 67 resolution of (5.88 x 3.87) µm/pixel. The accumulations of fluid were manually 68 segmented by two UMN ophthalmologists.

70
The Kermany dataset consists of 530 OCT volumes divided into three categories: DME, 71 DRUSEN and NORMAL [21]. The images were also acquired with a Heidelberg 72 Spectralis imaging system and manual fluid segmentation was performed by three 73 trained graders and approved by two ophthalmologists. For each volume, the number of 74 B-Scans varied from 1 to 13, with most volumes containing only one or two.    [17], which is one of the most 88 powerful image denoising methods with its collaborative filtering. We also tested the 89 Non Local Means (NLM) algorithm presented by Buades et al. [19] which exploits the 90 presence of similar features within an image. For the NLM and BM3D, we used a seach 91 window of 21 pixels and a patch size of 7. Finally, we implemented the median filter (as 92 1D filter of 5 pixels).

94
Based on Venhuizen et al. [10] method, we decided after having denoised the B-Scans to 95 perform a segmentation between the ILM and BM layers, where fluid accumulation can 96 be located. We investigated four architectures encountered in the literature: the U-Net, 97 SEU-Net, Seg-Net and ENet in order to identify the most efficient one for the ROI The second model of this pipeline generates a binary fluid segmentation mask. It takes 104 as input two images: the B-Scan preprocessed by the BM3D algorithm and the output 105 of the first model, namely the ROI mask. One of the major advantages of this approach 106 is that the model will be able to focus only on pixels that are likely to contain fluid. For 107 this segmentation task, we also tested the same four architectures to identify the most 108 relevant one. We compared our performances with several approaches encountered in 109 the literature: Rashno et al. [2] and Ganjee et al. [16] methods for the UMN DME 110 dataset, Rashno et al. [2] and Chen et al. [11] for the UMN AMD, and Ganjee et al. [16] 111 and Lu et al. [7] for Kermany.   The Tversky Index (TI) is a generalization of the dice score allowing more flexibility in 147 the balance between false positives and false negatives by means of the two scalar 148 hyperparameters α and β as shown in Eq 2.
149 TI = T P T P + αF N + βF P (2) where TP represents the true positive, FN the false negative and FP the false 150 positive pixels.

151
In Eq 1 , γ ranges from 1 to 3. We trained both ENets with γ > 1 so that the loss 152 function focuses more on less accurate pixel predictions. We performed an optimization 153 of hyperparameters including the α, β and γ coefficients by Random Search and 154 obtained the best results with α = 0.6, γ = 0.4 and γ = 4 3 . Indeed, using a higher value 155 of α improves the model performances by minimizing false negative predictions.
with S the segmentation result and GT the ground truth.

180
Three other metrics were evaluated as secondary metrics to validate our segmentation 181 models: the Intersection over Union (IoU), Precision and Recall.  The results of the feature-based metrics clearly show that the median filter allows a 207 better preservation of the image features associated with a short execution time, with a 208 SSIM and a VIF 8% higher than the other methods. However, the median filter does 209 not remove as much noise as the BM3D and NLM algorithms. Because of its good 210 ability to remove speckle noise according to its PSNR score, while preserving edges, we 211 decided to preprocess our B-Scans with the BM3D algorithm.

213
After having preprocessed the B-Scans with the BM3D algorithm to reduce the speckle 214 noise, we trained four different network architectures to determine the best one for 215 segmenting the ROI. We evaluated the models performances with several metrics as 216 reported in Tab 2. These results allowed us to identify the most interesting architecture for ROI Once we finished the preprocessing step of the OCT B-Scans with the denoising 225 algorithm BM3D and the ROI segmentation, we were able to perform fluid 226 segmentation. As we previously did for the ROI, we determined the best possible 227 architecture for segmenting fluid accumulations in OCT images. Therefore, we trained 228 four different architectures using the preprocessed and ROI segmented B-Scans from the 229 three datasets. Results are detailed in Tab 3. Thanks to this comparative analysis, we were able to identify the most interesting   Rashno et al. [2] and Ganjee et al. [16] methods respectively for the UMN DME dataset. 252 A smaller improvement was obtained for the Kermany dataset with a gain of 4% and 253 1% compared with Ganjee et al. [16] and Lu et al. [7]. However, our pipeline did not  We evaluated the performance of our fluid quantification step using two metrics: the 258 Pearson correlation coefficient ρ and the coefficient of determination R 2 . We could not 259 estimate the fluid surface for the Kermany dataset, because the resolution of the  Computer Assisted Intervention conference (MICCAI). We also evaluated our network 280 on AMD OCTs. We compared our performance on the 5 OCT volumes of our test set 281 with the one of Rashno et al [2] and found a very slight improvement in results of 1%. 282 We had more difficulties to compare our results to those of Chen et al. [11], who report 283 a mean Dice Score of 94%, because it was not made clear which OCT volumes were 284 considered in the training and testing phases of their work. To avoid these issues, we  In order to assess the generalization and potential clinical application of the proposed 290 framework, it would be necessary to conduct in the future an additional experiment on 291 a clinical routine dataset.

293
In this paper, we have described a novel approach to automatically segment fluid in 294 OCT B-Scans. The proposed pipeline starts with a preprocessing step to reduce the 295 speckle noise thanks to the BM3D algorithm. Then, it consists of a cascade of ENets 296 where the first one extracts the region of interest between the ILM and BM and the 297 second one segments fluid accumulations. We complete our network with a 298 post-processing step by training a random forest classifier to remove false positive pixel 299 detections and thus to improve our segmentation performances. The proposed method 300 showed good performances with a DSC over 80% on three different datasets associated 301 with different types of diseases. In the future, we plan to test our method on a "real life" 302 dataset to assess the generalization and clinical benefit of the proposed framework.