Early Brain Imaging can Predict Autism: Application of Machine Learning to a Clinical Imaging Archive

A comprehensive investigation of early brain alterations in Autism Spectrum Disorder (ASD) is critical for understanding the neuroanatomical underpinnings of autism and its early diagnosis. Most previous brain imaging studies in ASD, however, are based on children older than 6 years – well after the average age of ASD diagnosis (~46 months). In this study, we use brain magnetic resonance images that were collected as part of clinical routine from patients who were later diagnosed with ASD. Using 15 ASD subjects of age three to four years and 18 age-matched non-ASD subjects as controls, we perform comprehensive comparison of different brain morphometric features and ASD vs. non-ASD classification by Random Forest machine learning method. We find that, although total intracranial volume (TIV) of ASD was 5.5 % larger than in non-ASD, brain volumes of many other brain areas (as a percentage of TIV) were smaller in ASD and can be partly attributed to larger (>10 %) ventricles in ASD. The larger TIV in ASD was correlated to larger surface area and increased amount of cortical folding but not to cortical thickness. The white matter regions in ASD had less image intensity (predominantly in the frontal and temporal regions) suggesting myelination deficit. We achieved 95 % area under the ROC curve (AUC) for ASD vs. non-ASD classification using all brain features. When classification was performed separately for each feature type, image intensity yielded the highest predictive power (95 % AUC), followed by cortical folding index (69 %), cortical and subcortical volume (69 %), and surface area (68 %). The most important feature for classification was white matter intensity surrounding the rostral middle frontal gyrus and was lower in ASD (d = 0.77, p = 0.04). The high degree of classification success indicates that the application of machine learning methods on brain features holds promise for earlier identification of ASD. To our knowledge this is the first study to leverage a clinical imaging archive to investigate early brain markers in ASD.


Introduction
According to a recent CDC survey (CDC, 2014), 1 in 68 children have a diagnosis of Autism Spectrum Disorder (ASD) and a more recent CDC survey of parents indicated that this number can be as high as 1 in 45 (Zablotsky et al., 2015) Early detection of ASD is important as it allows for the application of early intervention methods. Early intervention is effective in reducing the impact of impairments (Dawson et al., 2010) and may result in more positive long-term outcomes for the child (Rogers and Vismara, 2010;Pickles et al., 2016). The risk for autism is influenced by genetic and pre, peri, and post-natal environmental factors (Sandin et al., 2014). The environmental factors related to the risk for autism can interact with genetic factors making the identification of ASD etiology extremely difficult. In addition to better clinical outcomes, early detection of ASD can also help in separating the effects of post-natal environmental risk factors of ASD and this can lead to improving our knowledge of its etiology.
The median age of ASD diagnosis is 46 months (CDC, 2014) and current ASD diagnosis is based on a clinical assessment of the individual's behavior and intellectual abilities. However, this approach is limited as early diagnosis is not straightforward and behaviorally based diagnosis procedures can be subjective, time consuming, and inconclusive due to factors such as comorbidity (Close et al., 2012). Furthermore, such an approach does not provide insight into the neural underpinnings of ASD since it is based only on behavioral symptoms. Magnetic resonance imaging (MRI) is a non-invasive tool widely used to capture brain morphology. ASD diagnosis based on MRI can be objective and can potentially be utilized even at the prenatal and neonatal stage (Glenn, 2010) and hence can be a useful tool for brain biomarker discovery for early detection of ASD. MRI has already been successfully utilized for early diagnosis of brain disorders such as Alzheimer's (Frisoni et al., 2010;Hampel et al., 2010;Hazlett et al., 2017).
A number of MRI studies have reported alterations in brain regions involved in language and social behavior in ASD, particularly in fronto-temporal regions (Bigler et al., 2007;Ha et al., 2015) , and the amygdala-hippocampus complex (Groen et al., 2010;Nordahl et al., 2012). Early brain overgrowth (Campbell et al., 2014) and alterations in corpus callosum (Wolff et al., 2015) and cerebellum (D'Mello et al., 2015) have also been reported. However, these findings have been somewhat inconsistent (Jumah et al., 2016;Katuwal et al., 2016b). Similarly, in recent years, there have been a number of studies that have applied machine learning techniques for ASD vs. non-ASD classification using MRI derived brain features. These studies have reported high classification accuracies (>80%) for well-matched datasets (Ecker et al., 2010;Jiao et al., 2010;Katuwal et al., 2015). Most MRI studies that investigate ASD brain alterations have used subjects older than six years and there are only a very few studies in early childhood (< four years) (Nordahl et al., 2012;Auzias et al., 2014;Padilla et al., 2015;Hazlett et al., 2017). Of the studies using subjects less than age four years, only a small subset of brain features such as volume, total cortical surface area, and shape have been investigated. As such, there is a significant knowledge gap in brain imaging research on early ASD brain alterations. A comprehensive investigation of brain alterations in early childhood in children with ASD is needed to better characterize this disorder and merits special attention due to the importance of early detection of ASD.
In this study, we compare brain features of ASD subjects in early childhood (three to four years) with non-ASD subjects of the same age group using a comprehensive set of cortical and sub-cortical morphometric features. In addition, we include image intensity features (mean and standard deviation of intensity of brain structures). Multi-variate brain alterations are identified in a purely data-driven approach by applying machine learning models trained for ASD vs. non-ASD classification.

Subjects
Head MR images used in this study were obtained from the clinical imaging archive of Geisinger Health System, Danville, PA. We queried the Geisinger imaging archive to identify ASD patients who had an MR image before the  Figure A2 in Appendix A. We performed a strict data quality analysis to remove images with imaging artifacts, motion, lesions, and abnormally large ventricles. The remaining images (247 non-ASD and 122 ASD) were processed using Freesurfer v 5.3.0 (Fischl, 2012) recon-all workflow with default settings and the images that failed Freesurfer segmentation (5 non-ASD and 7 ASD) were excluded from the study. We then removed 120 non-ASD subjects as they had a neurodevelopmental disorder as identified by ICD 9 codes. This resulted a total of 112 non-ASD and 115 ASD subjects. To avoid gender confounds (Lai et al., 2013), 54 female subjects (20 non-ASD, 34 ASD) were excluded.
The lower threshold for age was based on the applicability of Freesurfer on brain images of very young subjects. Freesurfer which was initially designed for adult subjects may not provide reliable estimates for subjects younger than three years. As images of subjects as young as three years have been successfully segmented by Freesurfer (Retico et al., 2016), lower threshold for age was set to three years. The upper threshold for age was set to four years to look for early markers.
The segmentation results from Freesurfer of remaining 41 images (23 non-ASD, 18 ASD) were assessed using the ENIGMA protocols (http://enigma.ini.usc.edu/protocols/imaging-protocols/). Five non-ASD and one ASD images with imperfect Freesurfer segmentations were excluded from the study. Our final dataset consisted of 15 ASD subjects(42.6 ± 3.5 months; 37.7 to 47.3 months) and 18 non-ASD (41.4 ± 2.9 months; 37.0 to 47.0 months) subjects which was used as the control group; no significant age difference (Kolgmogorov-Smirnov statistic = 0.23, p = 0.76). See Table 2.1 for subject demographics and image parameters.

Brain Features Extraction
Brain features were extracted using the recon-all workflow of Freesurfer v. 5.3.0 Fischl et al., 1999Fischl et al., , 2002Ségonne et al., 2004). The volume of 40 sub-cortical structures from Aseg atlas (Fischl et al., 2002) were extracted. Similarly, volume, surface area, Gaussian curvature, mean curvature, folding index, curvature index, thickness mean, thickness standard deviation (std.), intensity mean, and intensity std. of 34 cortical structures from Desikan-Killiany atlas (Desikan et al., 2006) were extracted. In total, 687 brain morphometric and intensity features were derived for each image. Intensity mean and intensity std. of a brain structure is the mean and standard deviation respectively of the voxel intensities within the brain structure. The volume features of each subject were normalized by its TIV since relative volumes can be directly compared across subjects and are more robust against scanner effects (Takao et al., 2011). To eliminate the effects of image intensity biases, intensity features of each image were normalized by their respective sums across all the brain structures.

Univariate Analysis
The statistical significance of ASD vs. non-ASD brain feature differences was calculated using two sample t-tests. For each feature type, multiple comparisons correction was performed using false discovery rate (Benjamini and Hochberg, 1995). The effect size (ES) of the differences were quantified by Cohen's d ( Baron-Cohen et al., 2000).
2.4 Multivariate Analysis: ASD vs. non-ASD classification using Random Forest ASD vs. non-ASD classification was performed by Random Forest (RF) (Breiman, 2001) classification models trained with all 687 FreeSurfer brain features. At first RF models were trained using all brain features and then models for each feature type were trained separately. Classification success was quantified by area under the ROC curve (AUC). The average classification success across the data was estimated by 5-fold cross-validation with stratified folds. Similarly, classification contribution or importance scores for features were average across the 5 folds. For detailed methodology of multivariate analysis, please see Appendix A at the end.

ASD vs. non-ASD Brain Feature Differences
The brain features for which ASD vs. non-ASD differences were statistically significant (p < 0.05, uncorrected) are presented in Figure 1. Effect sizes of all these differences were moderate to large (Cohen's d > 0.5). None of the features survived the multiple comparisons correction and all the results reported heron are before the correction.
Statistically significant area features were mainly located in frontal, temporal, supramarginal, and posterior cingulate regions. Except for the area of the left temporal pole, the other eight statistically significant area features were larger in ASD. The thickness mean of entorhinal gyrus was larger in ASD and the effect size was large (d > 1; p = 0.0009) whereas the thickness std. of the right entorhinal gyrus was smaller in ASD (d = 0.75; p = 0.035). Similarly, thickness std. of the right superior frontal, fusiform, and caudal middle frontal gyri were larger in ASD (d > 0.60).
Most of the volumes with statistically significant differences (not corrected for multiple comparisons) were smaller in ASD except for the right postcentral, supramarginal, and 5 th ventricle volumes. However, it should be noted these reported volume features have been normalized by TIV. Raw volumes of these structures were in fact larger in ASD but the differences were not statistically significant. Similarly, all global raw volumes were larger in ASD, but when normalized by TIV, most of them were smaller in ASD. This discrepancy is mainly because TIV in ASD was larger than in non-ASD by 5.5% (d = 0.4, p = 0.17); see Figure B1 in Appendix B. Larger TIV in ASD was mainly due to the larger ventricles. All ventricles were larger (>10%) in ASD, in particular the 5th ventricle was 290% larger in ASD (p = 0.015). Total ventricular CSF volume was 27.89% larger in ASD (d = 0.38, p = 0.28) and after normalizing by TIV, it was 19.14% larger (d = 0.29, p = 0.42).
The folding index and curvature index features with statistically significant differences were present in the frontal, temporal, cingulate, postcentral, and precuneus regions and all were larger in ASD. Folding indices of 58 out of 68 cortices were higher in ASD. The differences in curvature index of the left posterior cingulate gyrus and the folding index of the right precuneus gyrus were large (d > 1). Similarly, the Gaussian curvature of the left posterior cingulate gyrus and left pericalcarine gyri were larger in ASD.
In image intensity mean features, there were two distinct type of differences: in general, white matter (WM) regions had lower and sub-cortical regions had higher mean intensity in ASD. Mean intensity of the WM structures near frontal, supramarginal, precuneus, precentral, and pars opercularis gyri were lower in ASD. In contrast, mean intensity of the Maybe we should share all the feature differences as an excel file? Generally refered as septum pallidum http://www.radlex.org/RID/RID6525.  right thalamus, left caudate, left putamen, and left accumbens were higher in ASD. In general, in ASD, intensity std. were higher in WM near gyri but smaller in cerebellum WM and corpus callosum (not significant).

ASD vs. non-ASD Classification
ASD vs. non-ASD classification AUC scores for each feature type are presented in Figure 2. The point and error bar represent the mean and standard deviation of the AUC scores from 5 test folds. An average AUC of 0.92 was achieved when all brain features were used. When the classification was performed separately for each brain feature type, we found that most of the predictive power came from the intensity mean (AUC = 0.83), folding index (AUC=0.69), volume (AUC=0.69), and area (AUC = 0.69) features. When intensity mean and intensity std. features were used together, AUC of 0.95 was achieved.

Important Features for Classification
The important features in classification for the feature types that yielded high AUCs (intensity mean, folding index, volume, and area) are presented in Figure 3. After sorting the importance scores of the features in descending order, the features whose cumulative sum of scores was at least 50% of the total importance scores were deemed as important for classification and are presented in Figure 3. In Figure 3, a feature is represented by a bar and its length is proportional to the contribution of the feature towards classification relative to the most important feature. The numbers at the left of the bars represent the effect size of the ASD vs. non-ASD difference and the asterisk represent statistical significance of the difference at 0.05.
When only image intensity mean features were used for classification, the most important feature was the intensity mean of the WM neighboring rostral middle frontal gyrus and compared to non-ASD, it was lower in ASD (d = 0.77, p = 0.04). The mean intensities of the WM neighboring the right caudal middle frontal, left temporal pole, and the right pars opercularis were also important for classification and were smaller in ASD. Other important brain structures were the left accumbens-area, WM-hypointensities, and right vessel. Similarly, when only area features were used for classification, the surface areas of the left middle temporal gyrus, left posterior cingulate gyrus, right bank of superior temporal sulcus (bankssts), and right rostral middle frontal gyrus were important for classification. All of the important area features were larger in ASD. In classification using only folding index features, the left middle temporal gyrus was the most important brain structure for classification. It was also the most important structure while performing classification using area features. Other important structures were left middle temporal, left rostral middle frontal, and left inferior parietal gyrus. All of the important folding index features were larger in ASD. Similarly, when classification was performed only with volume features, the volumes of the left putamen, left inferior temporal gyrus, and left accumbens-area were important for classification and these features were smaller in ASD after normalizing for TIV. When all the features were used for classification, the most important feature was the mean intensity of the WM neighboring the rostral middle frontal gyrus; note that the same region was the most important feature in the classification using image intensity features only. The mean intensities of WM connecting left temporal pole and right caudal middle frontal gyrus, were important for classification and were smaller in ASD.

Discussion
We achieved high ASD vs. non-ASD classification success using brain morphometric and image intensity features. The important features for classification were mainly present in the frontal and temporal regions and these regions have been previously associated with ASD (Bigler et al., 2007;Ha et al., 2015). Most of the discriminative or predictive power for classification came from the intensity features followed by folding index, volume, and area. In summary, compared to non-ASD three main brain alterations in ASD were noted: (1) abnormally larger ventricles, (2) higher cortical gyrification, and (3) lesser WM intensity in frontal and temporal region. This is the first study to perform ASD vs. non-ASD classification in a very young population (3 to 4 years) using comprehensive brain features and to achieve high classification success rates. Very few studies have performed similar investigations to identify brain alterations in ASD.

Early brain overgrowth in ASD
One of the most replicated findings in ASD is that toddlers with ASD (age 2-4 years) on average have a larger head size than non-ASD (Carper et al., 2002;Courchesne et al., 2011;Hazlett et al., 2012;Campbell et al., 2014). A recent study by (Hazlett et al., 2017) also reported 5.5% larger TIV in high risk ASD compared to negative high risk ASD at the age of two years. Our study also shows 5.5% larger TIV in ASD. The larger TIV in ASD in our study was mainly due to the larger ventricular volume in ASD (27.9% and 19.1% as a percentage of TIV) whereas volume of other brain regions (as a percentage of TIV) were smaller in ASD. Padilla et al. (2015) have also reported smaller brain volumes in temporal, occipital, insular, and limbic regions in ASD after adjusting for total brain volume.

Overgrowth is related to the increase in surface area and is relatively independent to cortical thickness
In our study, only one thickness feature (right entorhinal gyrus thickness, smaller in ASD) had statistically significant group difference. However, there were nine area features (eight were larger in ASD) whose differences were statistically significant and cortical surface area was 7% larger in ASD. In addition, thickness features yielded near chance classification success suggesting low discriminative power for ASD classification while area features yielded AUC of 0.69. Two major longitudinal studies (Hazlett et al., 2011(Hazlett et al., , 2017 using subjects of age similar to our study report similar findings. A study by Hazlett et al. (2011) found no differences in cortical thickness but larger surface area in ASD at both 2 years and 4.5 years. Similarly, a recent study by Hazlett et al. (2017) using subjects of age 6 to 24 months, report 7% larger cortical surface area in ASD but no difference in cortical thickness. They also report insignificant contribution of thickness features in classification of ASD subjects with ASD from non-ASD subjects. These results suggests that while the group difference in cortical surface area increases with the early brain overgrowth, the difference in cortical thickness remains somewhat constant.

Relationship between overgrowth, cortical expansion, and higher cortical folding in ASD brains
Folding indices of most of the gyri (58 out of 68) were greater in ASD and folding index features yielded AUC of 0.69. On average, the cortex of ASD brains were 12.7% more folded than non-ASD. This suggests that ASD brains generally exhibit more gyrification and cortical folding. Similar to our study, Ecker et al. (2016) have reported that ASD individuals had a significant increase in gyrification around the left pre and post-central gyrus. Similarly, a study by Auzias et al. (2014) reported higher folding of the right intraparietal, the left medial frontal, and the left central sulci in children with ASD.

Hyper-expansion of the cortex during early brain overgrowth is a cause of higher cortical folding in ASD
The higher amount of cortical folding in ASD may be the after effect of early brain overgrowth in ASD toddlers. Based on insights from the cortical gyrification hypothesis based on cortical expansion (Tallinen et al., 2014(Tallinen et al., , 2016, we hypothesize that higher folding in ASD could be due to larger compressive stress in the cortex produced by the tangential hyper-expansion of the cortical layer. We found that the correlation between the average amount of cortical folding and total surface area of the cortex 0.68 (p = 0.005) in ASD and 0.78 (p = 0.0002) in non-ASD. Similarly, the correlation between the average amount of cortical folding and TIV was 0.53 (p = 0.04) in ASD and 0.65 (p = 0.03) in non-ASD.

Larger ventricles contribute to higher cortical folding in ASD by creating additional compressive stress
We also hypothesize that the abnormally larger ventricles in ASD induces additional compressive stress in the cortex and this leads to higher cortical folding. In this study, we note that the correlation between average amount of cortical folding and total ventricular CSF was 0.56 (p = 0.02) for ASD, but was -0.13 (p = 0.6) for non-ASD. The statistically significant positive correlation in ASD suggests the possibility of some degree of cortical folding may be due to the larger ventricles. In contrast, the non-significant correlation in TDC suggests that the cortical folding is not affected by the ventricles because they are of normal size and hence do not extend additional compressive stress to the cortex.

4.3.3
Larger volume of extra-axial fluid in ASD contributes to higher cortical folding in ASD Furthermore, it is possible that the larger volume of extra-axial fluid (CSF in the sub-arachnoid space) in ASD also contributes to the compressive stress in the cortex and hence more folding. A study by Shen et al. (2013) has reported that extra-axial fluid in ASD was 25% more than low-risk typical infants at the age of 6 to 24 months. In our study, we could not perform the analysis for the extra-axial fluid volume because FreeSurfer does not output the measure since T1 MRI does not have sufficient contrast between CSF and air as both appear dark in T1.

WM in frontal and temporal regions less myelinated in ASD
Image intensity mean of the WM neighboring frontal and temporal regions were lower in ASD and some of these features were the most important for classification. Less image intensity mean in WM means it is less bright suggesting less myelination surrounding the axons. Models using brain intensity features yielded high classification success and most of the important features were WM neighboring frontal and temporal regions. This suggests myelination deficits in frontal and temporal regions may be a potential early brain marker of ASD.
Several previous studies have also reported WM myelination deficits in ASD (Peters et al., 2012;Zinkstok et al., 2012;Croteau-Chonka et al., 2016). Zinkstok et al. (2012) reported that individuals with ASD had significantly less myelin content in numerous brain regions and WM tracts. In addition, they report that the observed myelination deficit increased with autism severity. Similarly, Peters et al. (2012) have reported myelination deficits in WM in autistic brains of age 0.5 to 25 years. These results suggest that brain disconnectivity resulting from insufficient development of the myelin sheath may be one of the underlying causes of ASD and myelination deficits in frontal and temporal regions in particular, may be a potential marker for early detection of ASD.

Small sample size
The sample size of the study was only 33. Further studies with larger sample sizes are needed to evaluate the robustness of our findings.

Findings are based on only male subjects
Only male subjects were used in this study to exclude gender confounds. Findings of this study can be interpreted in the context of brain abnormalities in male ASD subjects only and not the general ASD population because previous studies have shown gender differences in ASD brain morphometry.

Effect of preprocessing methods
Freesurfer was used to extract morphological features from brain MRIs. There have been several studies showing the effects of preprocessing methods and in some cases the bias introduced by the methods were larger than the actual group differences (Katuwal et al., 2016a). One way to minimize the possibility of interpreting method effects as biological effects is to use multiple preprocessing methods and to cross-validate the results. However, at the time of this study, Freesurfer was the only available tool that generated the extensive morphological brain features we wanted to use for machine learning. In addition, Freesurfer which was initially designed for adult subjects may introduce some biases during the preprocessing of the images of the young subjects who are three to four years old. However, images of subjects as young as three years have been successfully segmented by Freesurfer (Retico et al., 2016) and this is why we decided to use the images of the subjects as young as three years.
4.6 Lack of first-hand confirmation of ASD/non-ASD status ASD status and non-ASD status were based on diagnostic codes from the electronic health records. Although, strict research criteria were not applied to confirm ASD status, each subject had at least 4 occurrences of ASD diagnoses assigned by clinical experts.

Effect of IQ
We have no information on IQ for either group. Significant group mean differences in IQ or differences in IQ distribution could be responsible for some of the findings (rather than ASD status).

Non-ASD subjects were not typically developing children
The non-ASD subjects which were used as controls were imaged for some reason and may have had atypical development. As such, results presented here are brain differences between ASD and non-ASD subjects and not ASD and typically developing children. Figure A2: Subject Selection Flowchart.