Limited One-time Sampling Irregularity Age Map (LOTS-IAM): Automatic Unsupervised Detection of Brain White Matter Abnormalities in Structural Magnetic Resonance Images

We propose a novel unsupervised approach of detecting and segmenting white matter abnormalities, using limited one-time sampling irregularity age map (LOTS-IAM). LOTS-IAM is a fully automatic unsupervised approach to extract brain tissue irregularities in magnetic resonance images (MRI) (i.e., T2-FLAIR white matter hyperintensities (WMH)). In this study, the limited one-time sampling scheme is proposed and implemented on GPU. We compared the performance of LOTS-IAM in detecting and segmenting WMH with various methods, including state-of-the-art unsupervised WMH segmentation of Lesion Growth Algorithm from public toolbox Lesion Segmentation Toolbox (LST-LGA) and state-of-the-art supervised WMH segmentation of convolutional neural network (CNN) based methods. Based on our experiments, LOTS-IAM outperformed LST-LGA, the state-of-the-art unsupervised WMH segmentation method, both in performance and processing speed. Furthermore, our proposed method also outperformed conventional supervised machine learning algorithms of support vector machine (SVM) and random forest (RF), and supervised deep neural networks algorithms of deep Boltzmann machine (DBM) and convolution encoder network (CEN).


Introduction
White matter hyperintensities (WMH) are common brain abnormalities found in brain magnetic resonance images (MRI) from patients with dementia/Alzheimer's Disease and other brain pathologies such as stroke and multiple sclerosis. WMH can be easily seen in T2-Fluid Attenuation Inversion Recovery (FLAIR) MRI as they appear brighter than the normal brain tissues. It is believed that WMH are associated with the progression of dementia  and other comorbidities. Hence, not surprisingly, there have been many studies on methods for detecting or segmenting WMH automatically.
Supervised machine learning algorithms such as support vector machine (SVM), random forest (RF) (Ithapu et al., 2014) and deep learning convolu-tional neural network schemes, e.g. DeepMedic (Kamnitsas et al., 2017;Rachmadi et al., 2017a), uNet (Ronneberger et al., 2015;Li et al., 2018) and uResNet (Guerrero et al., 2018) have emerged as the state-of-the-art machine learning algorithms for automatic WMH segmentation. However, all supervised methods are highly dependent on manual labels produced by experts (i.e., physicians) for training process. Furthermore, the quality of the label itself is dependent on and varies according to expert's skill and opinion, which rises questions about reproducibility in different sets of data. These intra/inter-observer inconsistencies usually are quantified and reported, but this does not solve the problem.
Unsupervised machine learning algorithms which do not need manual labels to work can eliminate the aforementioned dependency. Methods such as Lesion Growth Algorithm from Lesion Segmentation Tool toolbox (LST-LGA) (Schmidt et al., 2012) and Lesion-TOADS (Shiee et al., 2010) have been developed, tested in many studies and publicly available for unsupervised WMH segmentation. Unfortunately, their performance is very limited compared to that from the supervised ones (Ithapu et al., 2014;Rachmadi et al., 2017a).
An unsupervised method named irregularity age map (IAM) (Rachmadi et al., 2017b) and its faster version one-time sampling IAM (OTS-IAM) (Rachmadi et al., 2018c) have been recently proposed and reported to work better than LST-LGA, which is still the most commonly used method and the state-of-the-art for unsupervised WMH segmentation. This study completes the development of this IAM method by evaluating all its parameters and using other metrics to analyse its quality and applicability.
In summary, the main contributions of this study are: 1. Proposing a new approach named limited onetime sampling IAM (LOTS-IAM) which is faster than IAM (i.e., the original scheme) and OTS-IAM.

Irregularity Age Map
The irregularity age map (IAM) approach for WMH assessment on brain MRI was proposed in our previous work (Rachmadi et al., 2017b). It is based on a previous work in computer graphics (Bellini et al., 2016) to detect aged or wandered regions in texture images. The term "age map" is used to name the 2D array with values between 0 and 1 that denote irregularities in textures dubbed as age values. The closer the value to 1, the more probable the image pixel/voxel is to belong to a group of clusters with different texture from that considered as the "norm". The age map can be calculated using, instead, structural MRI, to detect abnormal regions within normal tissue. For this process, four steps are necessary: 1) preparation of the regions of interest where the algorithm will work (e.g. brain tissue mask), 2) patch generation, 3) age value calculation and 4) final age map generation. These four steps are visualised in Fig. 1 and described in the rest of this section. Note that steps 2 to 4 are executed slice by slice (i.e. in 2D).

Brain tissue mask generation
For brain MRI scans, the brain tissue mask is necessary to exclude non-brain tissues not needed in the calculation of IAM and which can represent "irregularities" per se; for example skull, cerebrospinal fluid, veins and meninges. We want to compare and identify brain tissues within other brain tissues, not skull or other parts of non-brain tissues. For this purpose we use two binary masks: intracranial volume (ICV) and cerebrospinal fluid (CSF) masks, the latter containing also pial elements like veins and meninges. In our experiments, the ICV mask was generated by using optiBET (Lutkenhoff et al., 2014) while the CSF mask was generated by using an in-house algorithm developed by The University of Edinburgh (Valdés Hernández et al., 2015). However, several tools that produce accurate output exist and can be used for this purpose (e.g. bricBET 1 , freesurfer 2 ). The pre-processing step before computing LOTS-IAM only involves the generation of these two masks as per in the original IAM and in OTS-IAM (Rachmadi et al., 2017b(Rachmadi et al., , 2018b. This study also uses the normal appearing white matter (NAWM) mask to exclude brain non-white matter area, as per OTS-IAM (Rachmadi et al., 2018b). NAWM masks were generated using the FSL-FLIRT tool (Jenkinson et al., 2002), but can also be generated using in-house tools or freesurfer, for example.

Patch generation
Patch generation generates two sets of patches; non-overlapping grid-patches called source patches and randomly-sampled patches called target patches, which can geometrically overlap to each others. Each of source patches will be compared with all target patches using a distance function to calculate each source patch's irregularity level. The Figure 1: Flow of the proposed LOTS-IAM-GPU. 1) Pre-processing: brain tissue-only T2-FLAIR MRI 2D slices are generated from the original T2-FLAIR MRI and its corresponding brain masks (i.e., intracranial volume (ICV) and cerebrospinal fluid combined with pial regions (CSF)). 2) LOTS-IAM: the brain tissue-only T2-FLAIR MRI slice is processed through the LOTS-IAM algorithm on GPU. 3) Post-processing: final age map of the corresponding input MRI slice is produced after a post-processing step, which is optional. rationale behind this is: if we successfully sample target patches mostly from normal brain tissues and calculate distance values between source patch and all target patches, then irregular textures located within the source patch will produce high absolute distance values for the respective source patch. In this study, we use hierarchical subsets of four different sizes of source/target patches which are 1 × 1, 2 × 2, 4 × 4 and 8 × 8. Unlike in the original study on natural images where all possible target patches are used to produce the age map (Bellini et al., 2016), we use a set of randomly sampled target patches to accelerate the computation.

Age value calculation
Age value calculation is the core computation of the IAM where a distance value called age value is computed by using the function defined below. Let s be a source patch and t a target patch, the age value of the two patches d is: where α = 0.5 in this study. Both maximum and mean values of the subtracted patches are used to include maximum and average differences between source and target patches in calculation. Please note that source/target patches are matrices in the size of either 1 × 1, 2 × 2, 4 × 4 or 8 × 8. Also, please note that each source patch will be computed against a set of target patches, so each source patch has a set of age values. To get the final age value for one source patch, the corresponding set (i.e. to that source patch) of age values is sorted in ascending order and then the mean of the first 100 age values is calculated. The rationale is simple: the mean of the first 100 age values produced by an irregular source patch is still comparably higher than the mean of the first 100 age values produced by a normal source patches. All final age values from all source patches are then normalised to real values between 0 to 1 to create the age map for one MRI slice. Examples of age maps generated by using four different sizes of source/target patches are shown in Fig. 1

Final age map generation
The final age map generation consists of three sub-steps, which are blending four age maps from age value calculation, penalty and global normalisation. Blending of four age maps is performed by using the following formulation: where α + β + γ + δ is equal to 1 and AM 1 , AM 2 , AM 4 and AM 8 are age maps from 1 × 1, 2 × 2, 4 × 4 and 8 × 8 source/target patches. In this study, α = 0.65, β = 0.2, γ = 0.1 and δ = 0.05 as weight blending parameters. Before the blending, age maps resulted from different size of source/target patches are up-sampled to fit the original size of the MRI slice and then smoothed by using Gaussian filter. The blended age map is then penalised using formulation below: where p i is voxel from the blended age map, v i is voxel from the original MRI and p o is the penalised voxel. Lastly, all age maps from different MRI slices are normalised together to produce 0 to 1 probability values of each voxel to be an "irregularity" with respect to the normal brain tissue. We name this normalisation procedure global normalisation.
Visualisations of age value calculation, blending, penalty and global normalisation are shown in Fig.  1. Some important notes on the computation of the IAM are: 1) source and target patches need to have the same size within the hierarchical framework, 2) the centre of source/target patches need to be inside ICV and outside CSF masks at the same time to be included in age value calculation and 3) the slice which does not provide any source patch is skipped to accelerate computation (i.e where no brain tissue is observed).

Limited one-time sampling IAM (LOTS-IAM)
While the original IAM has been reported to work well for WMH segmentation, its computation takes long time because it performs one sampling process for each source patch, selecting different target patches per source patch. For clarity, we dubbed this scheme as multiple-time sampling (MTS) scheme. MTS scheme is performed in the original IAM to satisfy the condition that target patches should not be too close to the source patch (i.e., location based condition). MTS scheme makes every source patch to have its own set of target patches, so extra time to do sampling for each source patch is unavoidable.
To accelerate the overall IAM's computation, we propose one-time sampling (OTS) scheme for IAM where target patches are randomly sampled only once for each MRI slice, hence abandoning the location based condition of the MTS. In other words, age values of all source patches from one slice will be computed against one (i.e. the same) set of target patches. We call this combination of OTS and IAM one-time sampling IAM (OTS-IAM). The OTS-IAM was proposed in our previous study (Rachmadi et al., 2018c).
In this study, we propose to limit the number of target patches to accelerate the overall IAM's computation. The original IAM and OTS-IAM, which run on CPUs, use an undefined large random number of target patches which could range from 10% to 75% of all possible target patches, depending on the size of the brain tissue in an MRI slice. We name our new scheme limited one-time sampling (LOTS) IAM or LOTS-IAM. The LOTS scheme enables us to implement IAM on GPU to accelerate IAM's computation even more.
We tested 6 different numbers of target patch samples which are 2048, 1024, 512, 256, 128 and 64. Because it is possible for LOTS-IAM to use less than 100 target patch samples, we also modified the number of samples to be used to calculate the mean for age values. For LOTS-IAM, the first 128, 128, 64, 32, 32 and 16 age values are used to calculate the mean of age values for 2048, 1024, 512, 256, 128 and 64 number of target patch samples respectively. Limited number of samples in power of two eases GPU implementation, especially GPU memory allocation, which is the case for LOTS-IAM.

MRI Data, Other Machine Learning Algorithms and Experiment Setup
A set of 60 T2-Fluid Attenuation Inversion Recovery (T2-FLAIR) MRI data from 20 subjects from the ADNI database was used for evaluation. Each subject had three scans obtained in three consecutive years. All of them were selected randomly and blind to any clinical, imaging or demographic information. All T2-FLAIR MRI sequences have the same dimension of 256 × 256 × 35. Full data acquisition information can be looked at in our previous study (Rachmadi et al., 2018b).Ground truth was produced semi-automatically by an expert in medical image analysis using the region-growing algorithm in the Object Extractor tool of Analyze T M software guided by the co-registered T1-and T2weighted sequences. For more details of the dataset, please see (Rachmadi et al., 2017a) and data-share url 3 to access the dataset.
Data used in this study were obtained from the ADNI (Mueller et al., 2005) public database 4 . The ADNI was launched in 2003 as a publicprivate partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimers disease (AD). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found here 5 .
We compare performances of LOTS-IAM with other machine learning algorithms that are commonly used for WMH segmentation; namely the original IAM, One-time Sampling IAM (OTS-IAM), Lesion Growth Algorithm from Lesion Segmentation Tool (LST-LGA), support vector machine (SVM), random forest (RF), deep Boltzmann machine (DBM), convolutional encoder network (CEN), patch-based 2D CNN with global spatial information (2D patch-CNN-GSI), patch-uResNet and patch-uNet. LST-LGA (Schmidt et al., 2012) is the current state-of-the-art for unsupervised hyperintensities segmentation. SVM and RF are machine learning algorithms commonly used for WMH segmentation in several studies (Rachmadi et al., 2017a), and they are used in this study as representations of supervised conventional machine learning algorithms. On the other hand, DBM, CNN and U-Net based methods represent supervised deep learning algorithms which are commonly used in recent years for WMH segmentation. For clarity, we do not further elaborate in the implementation of these algorithms. All experiments' setup (i.e., training/testing and algorithm's configurations) for SVM, RF, DBM and CEN algorithms are described in detail in (Rachmadi et al., 2017a).
Dice similarity coefficient (DSC) (Dice, 1945), which measures similarity between ground truth and automatic segmentation results, is used here as the primary metric for comparison between algorithms. Higher DSC score means better performance, and the DSC score itself can be computed as follow: where T P is true positive, F P is false positive and F N is false negative. Additional metrics positive predictive value (PPV), specificity (SPC) and true positive rate (TPR) are also calculated. Non-parametric Spearman's correlation coefficient (Myers et al., 2010) is used to compute correlation between WMH volume produced by each automatic method and visual ratings of WMH. Visual ratings of WMH are commonly used in clinical studies to describe and analyse severity of white matter disease (Scheltens et al., 1993). Correlation between visual ratings and volume of WMH is known to be high (Hernández et al., 2013). In this study, Fazekas's (Fazekas et al., 1987) and Longstreth's visual rating scales (Longstreth et al., 1996) are used for nonparametric evaluation of each automatic method. This non-parametric test is similar as in previous study (Rachmadi et al., 2017a). Table 1 shows the overall results of the performance of all methods that we have tested. Please note that the original IAM is listed as IAM-CPU.

General Experiment Results
From Table 1, we can see that all IAM configurations (i.e., IAM-CPU, OTS-IAM-CPU and LOTS-IAM-GPU methods) outperformed LST-LGA in mean DSC, positive predictive value (PPV), specificity (SPC) and true positive rate (TPR) metrics. Furthermore, we also can see that performances of IAM/OTS-IAM/LOTS-IAM not only outperformed LST-LGA but also some other supervised machine learning algorithms (i.e., SVM, RF and DBM). Moreover, some LOTS-IAM-GPU implementations also successfully outperformed CEN in four evaluated metrics and performed better than CNN based algorithms (i.e., CEN, Patch-uResNet, Patch-uNet and 2D Patch-CNN-GSI) in either PPV, SPC or TPR metrics. The best value for each evaluation metric is written in bold letters.
Evaluation metric values listed in Table 1 are extracted by using optimum threshold value of each algorithm. In this study, the optimum threshold value is based on DSC metric. We produced DSC performance curves for each algorithm similar to Figure 2. LGA and Patch-uResNet are compared in Figure  4 (raw) and Figure 5 (cut off). Figure 6 shows how LOTS-IAM could potentially be applied to characterise abnormalities using other MRI sequences, such as T1-weighted (T1W).

IAM vs. OTS-IAM vs. LOTS-IAM
One-time sampling (OTS) and limited one-time sampling (LOTS) not only successfully accelerated IAM's computation but also improved IAM's performance, as shown in Table 1. Implementation of IAM on GPU successfully accelerated IAM's processing speed by 17 to 435 times with respect to the Table 1: Algorithm's information and experiment results based on the Dice similarity coefficient (DSC), positive predictive value (PPV), specificity (SPC) and true positive rate (TPR) for each algorithm evaluated (the best value is written in bold). Explanation of abbreviations: "SPV/UNSPV" for supervised/unsupervised, "Deep Net." for deep neural networks algorithm, "Y/N" for Yes/No, "T2F/T1W" for T2-FLAIR/T1-weighted, "#MTPS" for maximum number of target patches, "#meTPS" for number of target patches used for calculating mean of age value, "TRSH" for optimum threshold and "Training/Testing" for training/testing time. Given "speed increase" is relative to IAM-CPU. original IAM-CPU. However, it is worth stressing that this increase in processing speed was not only due to the use of GPU instead of CPU, but also depending on the number of target patch samples used in the IAM's computation. One of the GPU implementations of LOTS-IAM (i.e., LOTS-IAM-GPU-64s16m) ran faster than LST-LGA. Note that the testing time listed in Table 1 excludes registrations and the generation of other brain masks used either in pre-processing or post-processing steps. The increase in speed achieved by the GPU implementation of IAM shows the effectiveness of the LOTS implementation for IAM's computation and performance.

Evaluation of Speed vs. Quality in LOTS-IAM
The biggest achievement of this work is the increase in processing speed achieved by the implementation of LOTS-IAM on GPU, compared to the original IAM and OTS-IAM. The first iteration of IAM can only be run on CPU because it uses multiple-time sampling (MTS). OTS-IAM samples patches only once, but still uses a high number of target patches (i.e., 2,048 samples) to compute the age map. In this study, we show that using a limited number of target patches leads to not only faster computation but also better quality of WMH segmentation in some cases based on mean of DSC. Figure 3 illustrates the relation between speed and quality of the output (mean of DSC) produced by IAM, OTS-IAM and all configurations of LOTS-IAM. Please note that Figure 3 is extracted from Table 1. On the other hand, LOTS-IAM-GPU using more target patches produced better PPV and SPC evaluation metrics than LOTS-IAM-GPU using less target patches. This case is then reversed in TPR metric where using less target patches is better than using more target patches.  Table 1). By implementing IAM on GPU and limiting number of target patch samples, computation time and result's quality are successfully improved.

Analysis of IAM's Blending Weights
In this experiment, different sets of blending weights in IAM's computation were evaluated. As previously discussed, the only parameters that IAM has are four weights used to blend the four age maps hierarchically produced for the generation of the final age map. We tested the 7 different sets of IAM's blending weights listed in Table 2. The first 3 sets blend all four age maps while the other 4 only use     Table 2.
From Figure 7, we can see that blending all four age maps improves IAM's performance and works better than using only one of the four available age maps. Based on the results listed in Table 2, blending weights of 0.65, 0.20, 0.10 and 0.05 for age maps produced from 1 × 1, 2 × 2, 4 × 4 and 8 × 8 source/target patches respectively gives the best DSC score of 0.4434. As this combination produced the best DSC score in this evaluation, we made this set the default set for IAM computation. Coincidentally, this set of blending weights has been used from the start of IAM's development and also used in the first paper of IAM (Rachmadi et al., 2017b).
The fact that combining the age maps of different patch sizes performs the best proves that it is needed to consider not only the colour of the individual pixels but also the local distribution of the pixel colours to correctly label WMH lesions.
Still, it appears that the individual pixel colour is a strong feature for classification as the best performing blending weights are 0.65, 0.2, 0.1 and 0.05.

Analysis of IAM's Distance Function's α-Parameter
IAM's distance function (Equation 1) has the αparameter which controls blending of maximum difference and mean difference between source and target patches. Mean difference is used where α = 0 whereas maximum difference is used where α = 1. The reason behind the use of α-parameter is that we might want to calculate distance value based on mean difference only, maximum difference only or combination of mean and maximum differences.  Figure 7. LOTS-IAM tested in this experiment is LOTS-IAM-GPU-512s64m.

Name
Blending Based on the results depicted in Figure 8 and Table 3, we can see that combining/averaging mean and maximum differences using α = 0.5 produced the best mean DSC though the performance differences are minimum. Coincidentally, α = 0.5 has been used from the start of IAM's development and also used in the first paper of IAM (Rachmadi et al., 2017b).  Table  3). LOTS-IAM used in this experiment is LOTS-IAM-GPU-512s64m.

Analysis of IAM's Random Sampling Scheme
To automatically detect FLAIR's WMH without any expert supervision, IAM works on the assumption that normal brain tissue is predominant compared with the extent of abnormalities. Due to this assumption, random sampling is used in the computation of IAM to choose the target patches. However, it raises an important question on the stability of IAM's performance to produce the same level of   Table 4: Mean and standard deviation values of DSC distributions for each IAM's setting depicted in Figure 9. Each of IAM's setting is tested on a random MRI data 10 times. results for one exact MRI data, especially using different number of target patches.

No
In this experiment, we randomly choose one MRI data out of the 60 MRI data that we have, and ran LOTS-IAM-GPU multiple times (i.e., 10 times in this study) using different number of target patch samples. Each result was then compared to the ground truth, grouped together and plotted as boxplots ( Figure 9) and listed in Table 4. Figure 9 and Table 4 show that the deviation of IAM's computation for one MRI data is small in all different settings of LOTS-IAM-GPU. However, it is true that by adding number of target patches in IAM's patch generation alienates this deviation as shown in Figure 9, where the boxplots produced by LOTS-IAM-GPU-2048s128m are smaller than the ones produced by LOTS-IAM-GPU-64s16m (see Table 4 for quantification).

WMH Burden Scalability Test
In this experiment, all methods were tested and evaluated to see their performances on doing WMH segmentation in different volumes of WMH (i.e., WMH burden). DSC metric is still used in this experiment, but the dataset is grouped into three different groups based on the WMH burden of each patient. The groups are listed in Table 5 while the results can be seen in Figure 10 and Table 6. Please note that IAM is represented by LOTS-IAM-GPU- 512s64m as it is the best performer amongst the IAM methods (see Table 1). From Figure 10, it can be appreciated that LOTS-IAM-GPU-512s64m performed better than LST-LGA in this experiment outperforming LST-LGA's performances distribution in all groups. LOTS-IAM-GPU-512s64m also performed better than the conventional supervised machine learning algorithms (i.e. SVM and RF) in Small and Medium groups. Whereas, LOTS-IAM-GPU-512s64m's performance was at the level, if not better, than the supervised deep neural networks algorithms DBM and CEN. However, LOTS-IAM-GPU-512s64m still could not beat the state-of-theart supervised deep neural networks 2D patch-CNN in any group.
To make this observation clearer, Table 6 lists the mean and standard deviation values that correspond to the box-plot shown in Figure 10. From both Figure 10 and Table 6 it can be observed that the deviation of IAM's performances in Small WMH burden is still high compared to the other methods evaluated. However, IAM's performance is more stable in Medium and Large WMH burdens. Table 6: Mean and standard deviation values of dice similarity coefficient (DSC) score's distribution for all methods tested in this study in respect to WMH burden of each patient (see Table 5). Note that LOTS-IAM-GPU-512s64m is listed as LIG-512s64m in this

Longitudinal Test
In this experiment, we evaluate spatial agreement between the produced results in three consecutive years. For each subject, we aligned Year-2 (Y2) and Year-3 (Y3) MRI and derived data to score for all methods tested in this study in respect to WMH burden of each patient (see Table 5).  Table 7 for full report).  Table 5). LOTS-IAM-GPU-512s64m is listed as LIG-512s64m in this table. the Year-1 (Y1), subtracted the aligned WMH labels of the baseline/previous year from the followup year(s)(i.e., Y2-Y1, Y3-Y2, and Y3-Y1), and then labelled each voxel as 'grow' if it has value above zero after subtraction, with 'shrink' if it has value below zero after subtraction, and with 'stay' if it has value of zero after subtraction and one before subtraction. This way, we can see whether the method captures the progression of WMH across time (i.e., longitudinally). An example of the output from this experiment is shown in Figure 12 where sections of the original FLAIR MRI, LOTS-IAM, LST-LGA and uNet across three respective years (Y1, Y2 and Y3) for a subject are depicted. Figure 11 summarises the results listed in Table 7 for all methods (i.e., LST-LGA, LOTS-IAM-GPU-512s64m, Patch-uNet, Patch-uResNet and 2D Patch-CNN-GSI). We can see that LOTS-IAM-GPU-512s64m outperforms LST-LGA and competes with deep neural networks methods of Patch-uNet, Patch-uResNet and 2D Patch-CNN-GSI, where LOTS-IAM-GPU-512s64m is the second best performer after Patch-uResNet in this longitudinal evaluation. This, again, confirms that the LOTS-IAM shows comparable performance with the stateof-the-art deep learning convolutional neural network methods.

Correlation with Visual Rating
In this experiment, we want to see how close IAM's results correlate with visual ratings of WMH, specifically Fazekas's visual ratings (Fazekas et al., 1987) and Longstreth's visual ratings (Longstreth et al., 1996).
The correlation was calculated by using Spearman's correlation. The correlation coefficients calculated were: 1) between the total Fazekas's rating (i.e., the sum of periventricular white matter hyperintensities (PVWMH) and deep white matter hyperintensities (DWMH)) and manual/automatic WMH volumes and 2) between Longstreth's rating and manual/automatic WMH volumes. The results are listed in Table 8. Table 8 shows that, although not much better, all LOTS-IAM-GPU methods highly correlate with visual rating clinical scores. Despite LST-LGA's output having lower value of mean DSC (see Table  1) compared to other methods, it still highly correlates with visual ratings. LOTS-IAM-GPU implementations have high values of both mean DSC and correlation with visual ratings. LGA and uNet. LOTS-IAM produces richer information of the WMH progression than LST-LGA/uNet as age maps of LOTS-IAM preserve the underlying variance of WMH's intensity, giving a good perspective on how WMH grow over time.

Conclusion and Future Work
The optimisation of IAM presented (LOTS-IAM-GPU) improves both performance and processing time with respect to previous versions of IAM. Despite not being a WMH segmentation method per se, it can be successfully applied for this purpose. Being unsupervised confers an additional value to this fully automatic method as it does not depend on expert-labelled data, and therefore is independent from any subjectivity and inconsistency from human experts, which usually influence supervised machine learning algorithms. Furthermore, our results show that LOTS-IAM also successfully outperformed some supervised deep neural networks algorithms which are DBM and CEN. Some improvements still could be done by adding or using different sets of brain tissues masks other than CSF and NAWM, for example cortical mask and cerebrum brain mask.
One major drawback of the original IAM is the long computation time that takes to process a single MRI data. LOTS-IAM-GPU successfully speeds up IAM's computation time by 17 to 435 times, not only owed to its implementation in GPU, but also to the use of a limited number of target patch samples. LOTS-IAM-GPU also outperforms LST-LGA, the current state-of-the-art method for unsupervised WMH segmentation, in both DSC metric and processing speed.
IAM could provide unsupervised labels for pretraining supervised deep neural networks algorithms and transfer learning, but this has not been explored too much. In our preliminary research, IAM is used in task adaptation transfer learning and predicting brain abnormalities progres-sion/regression (Rachmadi et al., 2018a). Due to its nature, it can be hypothesised its applicability to segment brain lesions in CT scans or different brain pathologies. Further works should also explore its implementation on a multispectral approach that combines different MRI sequences. The implementation of LOTS-IAM-GPU is publicly available at https://github.com/febrianrachmadi/lots-iam-gpu. tory for Neuro Imaging at the University of Southern California.