ABSTRACT
Rigid and affine registration to a common template is one of the vital steps during pre-processing of brain structural magnetic resonance imaging (MRI) data to make them suitable for further processing. Manual quality check (QC) of these registrations is tedious if the data contains several thousands of images. Therefore, I propose a machine learning (ML) framework for fully automatic QC of registrations by local computation of the similarity cost functions such as normalized cross-correlation, normalized mutual-information and correlation ratio, making them as features in training of ML classifiers. A MRI dataset consisting of 220 subjects from autism brain imaging data exchange is used for 5-fold cross-validation and testing. To facilitate supervised learning, the misaligned images were generated. Most of the classifiers reached testing F1-scores of 0.98 for checking rigid and affine registrations. Therefore, these ML models could be deployed for practical use.
1. INTRODUCTION
Due to advances in magnetic resonance imaging (MRI) data acquisition in bigger scale and making these data publicly open, researchers have been doing processing on larger number of subjects making them big data brain MRI studies [1, 2]. Some of these publicly available brain MRI big data sets include HCP (aging, young adult, development), IXI, ADNI, ABIDE, OASIS, UK biobank and NAKO Germany [3]. Due to advances in high performance computing and availability of these big data, brain studies on larger population is feasible making big data to big brain science [4, 5]. Structural MRI is the prime modality for the studies involving voxel based morphometry (VBM) of whole brain, surface based morphometry and profiling of deep gray matter structures [6–8].
Quality control (QC) has become an important step starting from assessing raw image quality till final post-processing. Checking the data for quality before pre-processing is the first stage of QC and there have been several studies proposing different metrics for quantitative assessment of raw image quality [9, 10].
After initial QC has passed, these acquired raw structural MRI images has to undergo several standard pre-processing steps such as reorientation to a standard template, cropping, bias correction and alignment of the images in subject space to a common space. Transforming images to a common space is one of the crucial steps in the pre-processing stage to make them better suitable for further processing such as VBM. These transformations could involve both rigid and affine registrations with 6 and 12 degrees of freedom respectively. QC of these registrations is necessary to make sure that they are suitable for subsequent analysis. In studies involving data in few hundreds, manual checking would be feasible and results in 100 percent accuracy. However, in big data studies that involve tens of thousands of images, manual inspection will be a tedious task and tiring as it is a time consuming process and moreover the manual process could be prone to inter/intra observer ratings especially with doubtful cases. Hence, this raises the need for fully automatic quality control mechanism which potentially reduce the manual intervention to the possible minimum.
Machine learning (ML) has been in wide use in medical imaging for several tasks such as assessment of image quality, brain mapping, and disease diagnosis and prognosis [11–13]. Coming to registration quality, the traditional similarity cost functions primarily in use are sum of squared differences, normalized cross-correlation (NCC), joint entropy based methods such as normalized mutual information (NMI) and correlation ratio (CR) [14]. However, computing them locally and combining them by employing ML classifiers may show better accuracy compared with accuracy using the individual cost functions alone.
To my knowledge, this is the first study aiming at development and comparison of different ML models for fully automatic QC of registrations. Hence, I propose to develop supervised ML models that could be deployed for use in big data structural MRI pre-processing.
2. METHODS
This section describes the dataset used for cross-validation and testing of the classifier models, generation of misaligned images for supervised learning, and computation of local cost values, and cross-validation and testing of different ML models.
2.1. Dataset
The data for cross-validation and testing consisted of a subset of high-resolution T1-weighted (MPRAGE) MRI images of 215 subjects extracted from publicly available autism brain imaging data exchange (ABIDE I). Out of 215 subjects, 110 subjects were with autism [2].
2.2. Pre-processing
The data was pre-processed under Nipype environment by calling the required third party interfaces such as ANTS, FSL, Freesurfer and other required software [15]. Firstly, the raw images were reoriented to a standard space and cropped to remove the neck region using the FSL tools fslreorient2std, robustfov respectively [16]. Further, the cropped images were bias corrected to remove the low-frequency intensity variations due to inhomogeneity of scanner magnetic field using the N4 method interface from ANTS [17]. Further, the images were aligned to the 1×1×1 mm3 standard T1 template available in FSL both rigidly (that involves translation and rotation) and affinely (involves translation, rotation, scaling and shear) using the FSL flirt interface. After rigid and affine registration, the images were manually checked to make sure that they were correctly aligned to the template.
2.2. Generation of misaligned images
Since supervised learning approach was used in trained of ML models, misaligned rigid and affine test images were created by decomposing the corresponding transformation matrix into scales, translations and rotations; changing and applying these modified matrices to align the corresponding images in subject space to the common template. Ten test images (five each for rigid and affine) were generated for each image by different combinations of altering the matrix parameters to have variety of misaligned images. Fig. 1 showing the generated rigid misaligned images for a subject.
2.3. Computation of local cost values
Using custom written python script, the cost values that indicate the quality of registration between the transformed image and the template are computed locally using the cost functions such as NCC, NMI and CR. The computations were restricted to the brain region by using the brain mask of the corresponding 1×1×1 mm3 standard FSL template. The computations were performed by defining a 3×3×3 volume of interest (VOI). The final cost value (FCV) is average of the cost values obtained by moving the VOI over the entire image with a stride of 3 and computing the cost values at each stride as described by the equation given below.
In above equation, a = i * st: ((i * st) + vs), b = j * st: ((j * st) + vs), and c = k*st: ((k * st) + vs) indicate indices ranges of the moving image M and template T for defining the VOI. Cn is the total number of local cost computations, vs is the size of VOI and st is the stride, and cost is the any one of the costs given in equations (1), (2), and (3) below. Lastly, x, y, and z represent the size of the template (or moving) in the three directions. Although, the size of VOI and stride are free parameters, they were restricted to 3. The cost NCC is computed as follows:
Above, N is total number of voxels in either moving or template, xi is the ith intensity and is the mean intensity of the moving (image to be registered) and similarly, yi is the is the ith intensity and is the mean intensity of the template, σx and σy were standard deviations of intensities of moving image and template respectively. Further, the cost NMI is computed using the expression given below:
Where I(x; y) = H(x) + H(y) – H(xy) is mutual information between x (moving) and y (template), and H(x) and H(y) were entropy values of images x and y respectively. H(xy) is the joint entropy of x and y. Finally, the cost CR is computed as follows:
In equation (3), yxi is the ith intensity value in category x (category is either moving or template) nx is the number of intensities in category x, is average of category x and is the mean of intensities of both moving and template. Finally, the CR is just η.
2.4. Training of ML classifiers
Python based Scikit-learn (sklearn) module was used for fitting/training of ML classifier models [18]. Before feeding to ML classifiers, the three features (cost values) NCC, NMI and CR were rescaled to have values between zero to one as a standard procedure. The ML classifiers that were cross-validated and tested include linear discriminant analysis (LDA), Gaussian naïve Bayes (GNB), linear support vector classifier (SVC), k-nearest neighbors (kNN, k randomly chosen as 15), random forest (RF, number of decision trees were randomly chosen as 100), and Adaptive Boosting (AdaBoost with 100 decision stumps).
The data was divided into two groups one for repeated (100 repetitions) stratified 5-fold cross-validation (CV) and the other group was used for testing (Table 1). Area under the ROC curve (AUC) and F1-score values were calculated using the sklearn module to indicate the quantitative performance of each classifier during CV as well as testing.
3. RESULTS
The F1-scores during CV and testing for identifying the misaligned registrations using different ML classifiers is given in Table 2 for both rigid and affine.
The cross-validated ML models achieved the testing F1-scores in the range of 0.914–0.988 and the AUC values between 0.972–0.991 for identifying misaligned rigid registrations. Fig. 2 shows the true positive rate (sensitivity) and false positive rate (1-specificity) ROC curve for different cross-validated classifiers that were validated on the test set for finding faulty rigid registrations. Similarly, the cross-validated ML models reached the testing F1-scores and AUC values in the ranges of 0.922–0.989 and 0.989–0.993 respectively. Fig. 3 shows the ROC curve for different cross-validated classifiers that were validated on the test set for finding incorrectly aligned affine registrations.
4. DISCUSSION
Here, we developed a fully automatic framework using different ML classifier models for QC of rigid and affine registrations during structural MRI pre-processing. The classifier selection could be based on F1-score as the number of images in the two classes is heavily imbalanced. From Table 2, figs. 2 and 3, it is evident that all classifiers performed well, overall kNN, RF and AdaBoost classifiers demonstrated both better testing F1-scores and AUC values.
Another aspect to consider here is that checking the quality of alignment by computing the cost values locally using a moving VOI might have led to the anticipated F1-sores and AUC values on the test set. Also, since a stride of 3 is applied, which significantly reduced the computational time to around a minute for each local cost value, thus making it computationally efficient. All these cross-validated models could be saved using the python based pickle module for deployment.
5. CONCLUSION
The developed ML models could be deployed for fully automatic quality control of rigid and affine registrations in big data brain structural MRI pre-processing. Since the data is validated both on healthy and diseased brains, the trained models may be capable of identifying the misalignments both in health and disease. Future work could involve testing and improving these developed classifier models by using other cohorts and may be extended to deal with non-linear registrations. Further, the development of deep neural nets could be considered to eliminate the need for computation of cost values and the requirement of a template during training. Also, a single framework could be developed to deal with all types of structural images such as T1-weighted, T2-weighted, FLAIR and proton density.
6. COMPLIANCE WITH ETHICAL STANDARDS
This research study was conducted retrospectively using human subject data made available in open access by (ABIDE I). Ethical approval was not required as confirmed by the license attached with the open access data.
7. ACKNOWLEDGMENTS
The author(s) have no relevant financial or non-financial interests to disclose.