Abstract
Brain age prediction studies measure the difference between the chronological age of an individual and their predicted age based on neuroimaging data, which has been proposed as an informative measure of disease and cognitive decline. As most previous studies relied exclusively on magnetic resonance imaging (MRI) data, we hereby investigate whether combining structural MRI with functional magnetoencephalography (MEG) information improves age prediction using a large cohort of healthy subjects (N=613, age 18-88) from the Cam-CAN. To this end, we examined the performance of dimensionality reduction and multivariate associative techniques, namely Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA), to tackle the high dimensionality of neuroimaging data. Using MEG features yielded worse performance when compared to using MRI features, but the combination of both feature sets slightly improved age prediction (mean absolute error of 5.28 yrs). Furthermore, we found that PCA resulted in worse performance, whereas CCA in conjunction with Gaussian process regression models yielded the best prediction performance. Notably, CCA allowed us to visualize the features that significantly contributed to age prediction. We found that MRI features from subcortical structures were more reliable age predictors than cortical features, and that spectral MEG measures were more reliable than connectivity metrics. Our results provide an insight into the underlying processes that are indicative of brain aging, thereby advancing the discovery of valuable biomarkers of neurological syndromes that emerge later during the lifespan.
1 Introduction
The human brain changes continuously across the adult lifespan. This process, termed brain aging, underlies the gradual decline in cognitive performance observed with aging. Although aging-induced changes are not necessarily pathological, the risk of developing neurodegenerative disorders rises with increasing age (Abbott, 2011). The wide range of age-associated brain disorders indicates that the effect of aging on brain structure and function vary greatly among individuals. In fact, diseases such as Alzheimer’s disease and schizophrenia are thought to be the result of pathological processes associated with accelerated brain aging (Kirkpatrick et al., 2008; Sluimer et al., 2009). Therefore, a better understanding of the neural correlates underlying brain aging, as well as better ways to identify biomarkers of healthy aging could contribute to improve the detection of early-stage neurodegeneration or predict age-related cognitive decline.
One promising approach for identifying individual differences in brain aging relies on the use of neuroimaging data to accurately predict “brain age” – the biological age of an individual’s brain (Cole et al., 2019). In that context, machine learning (ML) techniques have proven to be a promising tool to ‘learn’ a correspondence between patterns in structural or functional brain features and the age of an individual (Dosenbach et al., 2010; Franke et al., 2010). In other words, ML techniques represent functions in high-dimensional space, wherein each dimension corresponds to a feature derived from neuroimaging data, to estimate the brain age. When predictive models are trained on neuroimaging datasets across the lifespan with a large number of subjects, they can generalize sufficiently well on unseen or ‘novel’ individuals. This provides the opportunity to deploy ML models at the population level and use the predicted age as a biomarker for atypical brain aging processes.
Most studies have explored the use of ML on data obtained from neuroimaging techniques to quantify atypical brain development in diseased populations. A common practice entails training a ML-based prediction model on healthy subjects and subsequently using it to estimate brain age in patients. The difference between an individual’s predicted brain age and their chronological age is then computed (the “brain age delta”), providing a potential measure that indicates increased risk of pathological changes that may lead to neurodegenerative diseases. For instance, this approach has been applied to study brain disorders and diseases including Alzheimer’s disease (Franke and Gaser, 2012; Gaser et al., 2013), traumatic brain injury (Cole et al., 2015), schizophrenia (Koutsouleris et al., 2014; Schnack et al., 2016; Shahab et al., 2019), epilepsy (Pardoe et al., 2017), dementia (Wang et al., 2019), Down’s syndrome (Cole et al., 2017a), Prader-Willi syndrome (Azor et al., 2019), and several others (Kaufmann et al., 2019), as well as other pathologies such as chronic pain (Cruz-Almeida et al., 2019), HIV (Cole et al., 2017c), diabetes (Franke et al., 2013). Additionally, the utility of estimating brain age has also been extended beyond understanding neurological disorders such as in the context of testing the positive influence of meditation (Luders et al., 2016), as well as education and physical exercise (Steffener et al., 2016b) on brain age. Recent work has also shown a relationship between the brain age delta and specific cognitive functions, namely visual attention, cognitive flexibility, and semantic verbal fluency (Boyle et al., 2019).
The studies mentioned above have mainly focused on estimating brain age based on structural magnetic resonance imaging (MRI), with most studies using T1-weighted images (e.g. Cole, Leech and Sharp, 2015; Cole, Poudel, et al., 2017). This is partly due to the availability of large lifespan MR-based open datasets, which has allowed researchers to train and validate their predictive models on a large number of subjects. However, it is well known that in addition to structural alterations, changes in brain function also occur during aging (Cabeza et al., 2018; Grady, 2012; Peters, 2006). One example of brain function changes associated with age is functional connectivity, which is defined as the similarity between activity in different brain regions (Sala-Llonch et al., 2015). This metric, derived from functional MRI (fMRI) data, has been successfully used to predict age (Dosenbach et al., 2010; Li et al., 2018; Liem et al., 2017; Nielsen et al., 2018; Vergun et al., 2013). A few studies have also investigated age-related brain function changes using electroencephalography (EEG) (Dimitriadis and Salis, 2017; Sun et al., 2019; Zoubi et al., 2018). Specifically, this modality has enabled researchers to build a brain age prediction model based on the temporal and spectral features of electrophysiological brain activity, as well as the connectivity between brain regions. A detailed overview of different neuroimaging modalities and ML methods that have been used to estimate brain age is presented in (J. Cole et al., 2018). However, the aforementioned studies investigated the age-related structural and functional brain changes in isolation, with the exception of (Liem et al., 2017) that combined cortical anatomy and fMRI. Furthermore, no study to date has exploited the high spatiotemporal resolution of magnetoencephalography (MEG) data (Baillet, 2017) to estimate brain age.
Moreover, a major roadblock to clinical applications of ML models is their explainability (Bzdok and Ioannidis, 2019), or the ability to attribute their predictions to specific input variables. From a clinical perspective, it would be useful to discriminate the neuroimaging features that are important to the ML model to estimate brain age. As argued by Kriegeskorte et al., decoding models can reveal whether information pertaining to a specific outcome or behavioural measure is present in a particular brain region or feature (Kriegeskorte and Douglas, 2019). In the same study, the authors also highlighted the difficulties and confounds associated with interpreting weights in a linear decoding model and consequently suggested the use of multivariate techniques to identify the informative predictors.
To address the aforementioned challenges related to the combined use of structural and functional neuroimaging data to predict age and the explainability of the associated ML models, the main aims of the present study were to: (i) investigate whether combining information from multiple modalities (MRI and MEG) would improve brain age prediction performance, (ii) examine the performance of dimensionality reduction techniques in conjunction with ML models, and (iii) improve the explainability of the brain age prediction framework by applying multivariate associative statistical methods for identifying key features that exhibit the most prominent age-related changes. To do so, we used structural MRI and functional MEG data collected from a large cohort of healthy subjects. We further applied Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA) as dimensionality reduction and multivariate associative techniques, respectively, to assess their predictive performance. Finally, we visualized the most informative features in the context of age prediction.
2 Materials and Methods
2.1 Dataset
We analyzed data from the open-access Cambridge Center for Aging Neuroscience (Cam-CAN) repository (see Shafto et al. 2014; Taylor et al. 2017 for details of the dataset and acquisition protocols), available at https://camcan-archive.mrc-cbu.cam.ac.uk//dataaccess/. Specifically, we used structural (T1-weighted MRI) and functional (resting-state MEG) neuroimaging data from 652 healthy subjects (male/female = 322/330, mean age = 54.3 ± 18.6, age range 18-88 years). The MR images were acquired from a 3T Siemens TIM Trio scanner with a 32-channel head coil. The images were acquired using a MPRAGE sequence with TR = 2250 ms, TE = 2.99 ms, Flip angle = 9°, Field of View = 256 × 240 × 192 mm3 and voxel size = 1 mm isotropic. The resting-state MEG data were recorded using a 306-channel Elekta Neuromag Vectorview (102 magnetometers and 204 planar gradiometers) at a sampling rate of 1kHz. For the resting-state scan, subjects were asked to lie still and remain awake with their eyes closed for around 9 min. Following exclusions (e.g. subjects that did not have both MRI and MEG data, unsatisfactory pre-processing results such as failure to remove cardiac and ocular artifacts and/or failure to extract the cortical surface for source reconstruction), we report findings from a final dataset including 613 subjects. A descriptive list of subjects included in our dataset is detailed in the Supplementary Materials.
2.2 Neuroimaging data processing
A summary of the entire feature extraction process for MR images and MEG recordings is illustrated in Fig. 1.
2.2.1 MRI structural analysis
The processing of T1-weighted MR images followed the pipeline presented in (Cole et al., 2017b) and was implemented using tools from the FMRIB Software Library (FSL, http://www.fmrib.ox.ac.uk/fsl) (Jenkinson et al., 2012). Briefly, the Brain Extraction Tool (BET) (Smith, 2002) was used to isolate the brain tissue, and the FMRIB’s Linear/Nonlinear Image Registration Tools (FLIRT/FNIRT) (Andersson et al., 2007; Jenkinson and Smith, 2001) were used to perform a non-linear registration to the MNI152 template brain (2mm resolution). Next, the registered images were segmented into Grey Matter (GM), White Matter (WM), and Cerebrospinal Fluid (CSF) using the MNI152 template mask for each tissue type. The GM maps were further segmented into cortical and subcortical regions to delineate the effects of aging on these regions. The resultant images were vectorized and subsequently z-scored to obtain a feature vector for each subject. This process resulted in a feature matrix where each row consisted of normalized intensity values for a single subject (see Fig. 1 for the exact number of features from each brain structure).
2.2.2 MEG analysis
The MEG data were processed using the open-source software MNE-Python (https://martinos.org/mne) (Gramfort et al., 2014). Raw MEG data were high-pass filtered at 1 Hz, notch filtered at 50 Hz and 100 Hz to remove power line artifacts, and resampled at 200 Hz. Cardiac and eye movement artifacts were identified using Independent Component Analysis (ICA) and automatically classified comparing the ICA components with the simultaneously recorded electrocardiography (ECG) and electrooculography (EOG) signals (Jas et al., 2018). Artifact-free MEG data were converted from sensor to source space on the subject’s cortical surface using the linearly constrained minimum variance (LCMV) beamformer (Van Veen et al., 1997). The cortical surface was reconstructed from the T1-weighted MR images as obtained from the FreeSurfer recon-all algorithm (Dale et al., 1999; B Fischl et al., 1999; Bruce Fischl et al., 1999; Fischl et al., 2004, 2002, 2001; Fischl and Dale, 2000). The sources were constrained within the cortical regions of the brain and assumed to be perpendicular to the cortical envelope. The noise covariance matrix was estimated using the empty room recordings, and the data covariance matrix was estimated directly from the MEG data. After source reconstruction, we parcellated the cortex into 148 brain regions using the Destrieux atlas (Destrieux et al., 2010). Each parcel time series was corrected for signal leakage effects using a symmetric, multivariate correction method intended for all-to-all functional connectivity analysis (Colclough et al., 2015). For each parcel, the power spectral density (PSD) for the entire resting state scan was calculated and averaged within 7 frequency bands, namely Delta (2–4 Hz), Theta (4–8 Hz), lower Alpha (8–10 Hz), higher Alpha (10-13 Hz), lower Beta (13–26 Hz), higher Beta (26–35 Hz) and Gamma (35–48 Hz). Relative power was calculated by dividing the power within each band by the total power across all bands (Niso et al., 2019). In addition to the PSD values, amplitude envelope correlation (AEC) within each frequency band was used to estimate the functional connectivity between different cortical parcels (Brookes et al., 2012; Hipp et al., 2012), as this method provides a robust measure for stationary connectivity estimation (Colclough et al., 2016). Inter-layer Coupling (ILC) was also calculated from the functional connectivity matrices to estimate the similarity of the connectivity profile across frequency bands (Tewarie et al., 2016). Therefore, each row of the resulting MEG feature matrix consisted of PSD, AEC and ILC values for a single subject.
2.3 Brain age prediction analysis
2.3.1 Gaussian Processes Regression (GPR)
Gaussian Process Regression (GPR) has been widely used for predicting chronological age from T1-weighted images (Aycheh et al., 2018; Cole et al., 2017a, 2017b, 2017c, 2015; J. H. Cole et al., 2018). GPR is a non-parametric approach, which finds a distribution over possible functions that are consistent with the data (Rasmussen and Williams, 2006). The main assumption underlying GPR is that any finite subset of the available data must follow a multivariate Gaussian distribution. The prior belief about the relationship between variables is decided by the sufficient statistics of these multivariate Gaussian distributions, namely the mean vector and standard deviation matrix. The standard deviation matrix, therefore, indicates the confidence of model predictions. Multivariate Gaussian distributions also have the ability to reflect local patterns of covariance between individual data points. Therefore, a combination of multiple such distributions in a GP can model non-linear relationships and is more flexible than conventional parametric models, which rely on fitting global models.
In this work, a GPR model was defined using the neuroimaging features as inputs (i.e. independent variables) and chronological age as the output (i.e. dependent variable). The GPR model was implemented using the scikit-learn toolbox (Pedregosa et al., 2011) in Python, with an additive dot-product and white kernel. The obtained feature vectors were used as inputs in the GPR model for training. We used a 20-fold cross-validation strategy, and each random split of the dataset consisted of 500 subjects in the training set (81.7 %) and the rest in the testing set (18.3 %). Model performance was evaluated using the Mean Absolute Error (MAE) and coefficient of determination (R2) of the prediction. We also compared the performance of GPR models with Support Vector Regression (SVR) models (Basak et al., 2007), which are more commonly used in the ML literature for regression problems. We observed that GPR outperformed SVR. Therefore, GPR was chosen to be the regression model for all further analyses.
2.3.2 Similarity metric
Following (Cole et al., 2017b), we represented the data as a N × N similarity matrix (N being the number of subjects in training set). The similarity between any two subjects was calculated using the dot product between their corresponding feature vectors. We also examined a different similarity metric, namely the cosine similarity, but it yielded comparable performance. Therefore, each testing subject was represented as a 500-element vector containing similarity values corresponding to each of the 500 training subjects.
However, the use of a similarity metric entails the following issues: (1) the training set needs to have enough subjects to sample the spectrum of healthy aging completely, and (2) the predictions are based on how similar a test subject is to each of the training subjects. To avert these issues, we used dimensionality reduction techniques, namely PCA and CCA, to identify the features that mostly contribute to brain age prediction. PCA and CCA project the data onto a lower dimensional space and allow ML models to represent age as a function of neuroimaging features, as opposed to similarity scores. This approach allowed us to visualize the age-related neuroimaging features after the model was trained.
2.3.3 Principal Component Analysis (PCA)
PCA is a linear dimensionality reduction technique using singular value decomposition (SVD) of the data to project it onto a lower dimensional space (Jolliffe, 2002). It is widely used to decompose multivariate datasets into a set of successive orthogonal components that explain the maximum amount of the data variance (e.g. Amico and Goñi, 2017; Larivière et al., 2019). The obtained principal components correspond to the maximal modes of variation and hence correspond to the most prominently changing features in the dataset. Often, the number of principal components is selected visually as the point where the total variance explained by increasing the number of components starts plateauing (“knee rule”). We applied PCA to project the feature matrix onto a lower dimensional space and subsequently estimate brain age using a GPR model. In each case, we used the knee of the curve relating the variance explained vs the number of principal components to decide the number of components to be retained. In all the models using only MRI data, 5 components were retained, whereas in all models using MEG data or a combination of MRI and MEG data, 10 components were retained. In every case, the number of retained principal components explained about 60-66% of the variance in the data.
2.3.4 Canonical Correlation Analysis (CCA)
CCA is another dimensionality reduction technique that identifies latent variables to model the covariance in input and output variables (Thompson, 2005). CCA has been successfully applied in the context of brain-behavior relationships (Smith et al., 2015), neurodegenerative diseases (Avants et al., 2014) and psychopathology (Xia et al., 2018). CCA, similarly to PCA, uses the SVD factorization method to reduce the dimensionality of the data. However, in CCA the covariance matrix is used instead of the input variance matrix. Therefore, the obtained canonical components are maximally correlated to the output variable.
In the present case, the CCA inputs were the neuroimaging feature matrices and the output the chronological age vector. Therefore, CCA retrieved a linear combination of the neuroimaging features that were maximally correlated to the age of the individuals. We used CCA to project the feature vector along this direction and subsequently used the projection values to predict age using GPR.
CCA also yields a loading vector for every CCA component that quantifies the contribution of each feature to that specific CCA component (Wang et al., 2018). We used these loading values to assess the contribution of each feature to brain age prediction and thereby understand which regions of the brain exhibit maximal age-related changes. To estimate the reliability of these loading values, we used the bootstrapped ratio, whereby we repeated the CCA analysis for 1000 bootstrapped samples of the dataset chosen at random with replacement (Efron and Tibshirani, 1986; McIntosh and Lobaugh, 2004). The bootstrapped ratio (BSR) of the loading values indicates which areas reliably contribute to the brain age prediction, thus increasing the overall reliability of the prediction models. The procedure for generating the BSR of the loading values is illustrated in Fig. 2.
We also examined deep CCA (Andrew et al., 2013) to learn a non-linear combination of features that maximally covary with age. However, deep CCA was not numerically stable and hence it was not explored further.
3 Results
A summary of the results is presented in Fig. 3. The performance of different brain age prediction methods is shown in detail in Tables 1 & 2. To get an estimate of the chance level of age prediction, we used predictions from a random model with no training. Irrespective of the modality of data used, the chance level of MAE was ∼16.74 years and R2 was around zero. These values served as a baseline to assess the performance of various models. All models, irrespective of the data modality, performed better than chance level thus indicating that all the considered neuroimaging features exhibited some age-related effects.
3.1 Dimensionality reduction techniques
We compared the performance of all models using a 20-fold cross-validation approach with random training (500 subjects) and testing (112 subjects) splits of the dataset in each fold. First, to compare the performance of different dimensionality reduction techniques with the similarity metric presented in (Cole et al., 2017b), we used only the voxel-wise T1-weighted intensity levels (from all tissues) as input to different models. CCA yielded the best performance with respect to age prediction, with a corresponding MAE of 5.57 yrs (Table 1). PCA resulted in a significantly degraded performance, yielding a MAE of 9.23 yrs. Using GPR on the similarity scores yielded worse performance (MAE of 7.3 yrs) when compared to using GPR on the raw features (MAE of 5.64 yrs). The failure of the similarity metric to yield the best performance is likely due to the sample size of the dataset, which was much smaller than the dataset used in (Cole et al., 2017b). This could possibly have led to incomplete sampling of the aging subspace and hence yielded worse performance.
Furthermore, when only cortical or subcortical MRI features or MEG features were used, similar trends were observed. Specifically, similarity metric and PCA reduction degraded performance, as compared to GPR on raw features (Fig. 3). Conversely, CCA yielded the best performance for age prediction.
3.2 Cortical vs. subcortical MRI features
To delineate the contribution of cortical and subcortical MRI features, we compared the performance for each of these features separately. Subcortical MRI features clearly outperformed cortical features, irrespective of the model used (Fig. 3 & Table 1). CCA was found to yield the best performance in each case, with subcortical MRI features yielding a MAE of 5.97 yrs and cortical MRI features yielding a MAE of 7.27 yrs (Table 1). These results indicate that the subcortical regions were more reliable indicators of brain age as compared to the cortical ones. This finding was further supported by the CCA loadings of MRI features, whereby subcortical regions exhibited higher BSR of loading values as compared to cortical regions (Fig. 4b).
3.3 Combining structural and functional features from MRI and MEG data
We extracted the relative PSD of the MEG data in 7 frequency bands for each brain region, as well as the AEC and ILC measures to quantify functional connectivity (Tewarie et al., 2016). However, ILC values did not significantly contribute to age prediction, with the corresponding MAE values being very close to those of the random model (Table 2). PSD performed better than AEC using GPR, but AEC performed better than PSD when using CCA (Table 2). Combining all the MEG features using a CCA+GPR model yielded the best performance, with a MAE of 9.67 yrs. However, this performance was considerably inferior compared to that obtained using MRI features (MAE of 5.57 yrs).
An important consideration when comparing MEG results to MRI is that the MEG features only contained information from the cortex, whereas the MRI intensities were from both cortical and subcortical regions. Therefore, we compared the performance of models using MEG features to those including only cortical MRI features. Across all methods, MEG features yielded worse performance compared to using cortical MRI features (worse by 2.45 yrs for GPR, 2.03 yrs for similarity, 2.61 yrs for PCA, and 2.4 yrs for CCA respectively). This suggests that MEG features alone were not as good predictors of brain age as MRI features.
To assess the potential of multimodal prediction, we combined all MRI and MEG features to predict brain age. As before, the CCA+GPR model yielded the best performance with a MAE of 5.28 yrs (Table 2), which was slightly better than using only MRI features (MAE of 5.57 yrs) or even only subcortical MRI features (MAE 5.45 yrs). This suggests that the MEG features added complementary information to the structural features, which improved the brain age prediction.
3.4 CCA loadings
One of the goals of the present work was to identify the brain regions which exhibit more pronounced age-related changes. The CCA loadings provided a way to assess the contribution of each neuroimaging feature to age prediction, thus indicating the features that exhibited the most reliable age-related changes. The histogram of the BSR of voxel intensity loading values, as well as the top 15% BSR of loading values for GM, WM, and CSF are shown in Fig. 4a & 4b, respectively. The histogram of BSR values indicates that GM and WM voxels exhibited more reliable age-related changes as compared to CSF (the histogram peak for GM and WM was located around −300 and −400 respectively, whereas the histogram peak for CSF was located around −100). Almost all of the loading values were negative, indicating a decreased voxel intensity with increasing age. Further, the top 15% of BSR values were confined to subcortical regions, thus supporting our earlier results that subcortical regions yield better age prediction. Some of these areas are shown in Fig. 4b, and 3D nifti volumes of the CCA loadings are available in NeuroVault (https://identifiers.org/neurovault.collection:6091). The highlighted (red) GM areas were localized in subcortical structures such as the putamen, thalamus, and the caudate nucleus, as well as regions in the cerebellum. Most of the highlighted (green) WM voxels were confined to the corpus callosum, thus indicating that the latter was associated to the most consistent age-related changes among WM voxels. Another structure among WM voxels that exhibited age-related changes was the thalamic radiation.
Furthermore, we visualized the CCA loadings for the MEG features. The PSD loadings are shown in Fig. 5 and the AEC loadings are shown in Supp. Fig. 1. Comparing the BSR values we found that PSD values were more reliable (BSR values ∼450) than AEC values (BSR values ∼300). Regarding the PSD loadings (Fig. 5), we observed different regions showing age-related effects within various frequency bands. Contrary to MRI loadings, whereby most of the loading values were negative, PSD loadings were found to be both positive and negative. The low-frequency bands exhibited decreasing PSD values with age, with delta and theta band PSD exhibiting maximal age-related effects in the frontal areas and alpha band PSD exhibiting maximal age-related effects in the visual and motor areas. Higher frequency bands (beta and gamma) exhibited increasing PSD values with age in frontal and motor areas. 1. Regarding the AEC loadings (Supp. Fig. 1), the all-to-all connectivity matrices (one per frequency band) were sorted by functional networks according to the Yeo 7-network brain cortical parcellation (Yeo et al., 2011). Most functional connections exhibited increased connectivity with age within all frequency bands, with the exception being the visual network, which showed decreased connectivity with age for the high alpha and high beta frequency bands. The ILC loadings are not shown since ILC values did not significantly contribute to age prediction (Table 2).
4 Discussion
In this study, we aimed to leverage multimodal neuroimaging data to predict age in a large cohort of healthy subjects (N=613) between 18-88 years. We applied dimensionality reduction techniques in conjunction with ML and found that the combination of MRI and MEG features with a CCA+GPR model yielded the best performance. Furthermore, using PCA was detrimental to the performance of GPR. We identified and visualized the regions that exhibited age-related changes and we found that subcortical T1-weighted intensity levels were the ones that help predict age more reliably than cortical ones. We also showed the age-related changes in the spectral features of various cortical regions, as observed using the MEG data. In addition, we demonstrated that using multivariate associative techniques such as CCA provide better explainability of the predictive models, which may contribute to the identification of clinically relevant biomarkers of pathologic aging.
4.1 Dimensionality reduction techniques
We used T1-weighted MR images and resting-state MEG data to develop a brain-age prediction framework that uses both structural and functional information of the brain. We restricted our analysis to cortical sources of the MEG data and thereby had functional information from cortical regions only. Since the goal was to predict age, the desired MAE for the perfect model would be 0 yr. However, owing to subject variability and the ill-conditioning of the problem, precisely what is considered a “healthy” subject, we did not expect to achieve a MAE of 0 yr.
We used similarity metric-derived feature vectors to compare the performance of different regression models, namely GPR and SVR. Similar to the results obtained by Cole et al. (Cole et al., 2017b), GPR outperformed SVR. Based on these results, in the present study we used GPR as the regression model of choice for age prediction. Furthermore, we explored the contribution of dimensionality reduction techniques in age prediction. PCA is one of the most commonly used dimensionality reduction techniques in neuroimaging. However, in our study PCA degraded prediction performance. This result may be explained by the fact that large variability exists between neuroimaging features across subjects. Since PCA yields components that are maximally varying in the dataset, these could be aligned to directions of subject variability in the dataset instead of age-related changes. Therefore, our results suggest that using PCA to perform dimensionality reduction does not lead to good performance in the context of brain age prediction. In contrast, CCA improved performance by yielding the component that maximally covaries with age, whereby identifying features that are most informative for age prediction.
4.2 Combining structural and functional features from MRI and MEG data
Combining structural information from MR images and functional information from MEG recordings resulted in a small improvement in brain age prediction for all models (Tables 1 & 2). This may indicate that MEG features carry non-redundant information for age prediction. Among the MEG features, we found that PSD and AEC values were better predictors of age compared to ILC values. These results are in agreement with previous EEG studies (Dimitriadis and Salis, 2017; Sun et al., 2019; Zoubi et al., 2018) which reported improved brain age prediction using power spectral features.
A possible reason that the prediction improvement yielded by the inclusion of MEG features was modest is that the extracted features were restricted to cortical regions. As MRI features from subcortical structures were found to be the best age predictors, we speculate that including functional features from deep brain structures could have resulted in greater improvement in the prediction models. This suggests that the use of newly developed methodologies to more reliably detect brain activity in deeper structures using MEG (Pizzo et al., 2019) could contribute to improved age prediction in future studies.
4.3 CCA loadings
Apart from yielded the best prediction accuracy, CCA was used to identify the brain regions that contribute more reliably to age prediction. CCA returns loading values for each input feature, therefore improving model explainability. Using the BSR of loading values for MRI features, we found that most of the voxel T1-weighted intensity levels were negatively correlated with age (Fig. 4a). A decrease in voxel intensities with age has been reported by (Salat et al., 2009), who suggested that this association was an indicator of brain atrophy. Thus, our findings are in agreement with previous studies that have reported cortical thinning with age (Fjell et al., 2009; Hogstrom et al., 2013; Salat et al., 2004; Storsve et al., 2014).
Furthermore, our results indicate that subcortical regions are more reliable predictors of age compared to cortical regions. The brain structures that most reliably exhibited age-related changes included the putamen, thalamus, and caudate nucleus, which are important structures involved in relaying a variety of information across the brain, in sensorimotor coordination, and in higher cognitive functions (Grahn et al., 2008; Sefcsik et al., 2009; Sherman and Guillery, 2002). A number of stereological and MRI studies have reported atrophy in subcortical regions associated with aging, specifically in the putamen (Bugiani et al., 1978), amygdala (Coffey et al., 1992; Fjell et al., 2013), hippocampus (Fjell et al., 2013; Nobis et al., 2019), caudate nucleus (Krishnan et al., 1990), substantia nigra (McGeer et al., 1977), thalamus (Sullivan et al., 2004; Fjell et al., 2013), and cerebellum (Andersen et al., 2003; Good et al., 2001; Torvik et al., 1986). Recent studies using large subject cohorts have also reported an age-related decrease in the hippocampal and temporal lobe volumes (Nobis et al., 2019). Hence, our findings are in agreement with the changes in size of specific brain areas associated with aging as reported in previous studies that are. However, we cannot rule out the possibility that the absence of strong negative correlations between age and MRI voxel intensities in the cortex could be attributed to improper alignment of sulci and gyri to the standard MNI152 brain template.
We found that the WM regions affected by age were mostly confined to the corpus callosum and the thalamic radiation. These results are in strong agreement with previous studies that have reported age-related alterations in WM structures (Salat et al., 2005), such as atrophy in corpus callosum fiber tracts (Ota et al., 2006; Pfefferbaum et al., 2000) and thalamic radiation (Cox et al., 2016). Although CCA loadings for CSF voxels did not exhibit high BSR values compared to their GM and WM counterparts, including CSF improved model performance. CSF information possibly indicates changes in brain volume and ventricle size that resulted in improved brain age prediction.
Among the examined MEG features, PSD and AEC values yielded the best performance, however the PSD values were more reliable than the AEC values (compare BSR values in Fig. 5 and Supp. Fig. 1). We found the BSR of loading values for PSD values to be both positively and negatively correlated with age, depending on the frequency band (Fig. 5). Our results showed that delta and theta power decreases with age, most prominently in frontal regions. These results are in agreement with the fact that slower waves (0.5–7 Hz) have been reported to decrease in power in older adults as compared to their younger counterparts (Caplan et al., 2015; Cummins and Finnigan, 2007; Leirer et al., 2011; Vlahou et al., 2014). Increased frontal theta activity has been linked to better performance in memory tasks (Jensen and Tesche, 2002; Onton et al., 2005), which may explain the decreasing power in lower frequencies for increasing age. Regarding the alpha band, the strongest effect of age was observed in the occipital cortex, whereby increased power within the higher alpha subband (10-13 Hz) was negatively correlated with age. These results align with several studies that have reported an association between a decrease in alpha power and increasing age (Gómez et al., 2013; Hübner et al., 2018). However other studies have not reported significant changes in alpha power with age (Heinrichs-Graham and Wilson, 2016; Xifra-Porxas et al., 2019). Likely, the later studies did not have sufficient statistical power to detect this age-related decrease in alpha power, since the cohort size was below 35 subjects, whereas the studies that reported an association between alpha power and age (including ours) had a sample size larger than 85.
In line with many previous studies, we observed an association between beta power and age (Heinrichs-Graham and Wilson, 2016; Hübner et al., 2018; Rossiter et al., 2014; Xifra-Porxas et al., 2019). Specifically, we found that the age-related increase in lower beta power (13-26 Hz) was restricted to frontal regions, whereas higher beta power (26-35 Hz) was restricted to the motor cortex. This beta power increase has been linked to higher levels of intracortical GABAergic inhibition as tested by pharmacological manipulations (Hall et al., 2011; Muthukumaraswamy et al., 2013). This suggests that the age-related changes in beta power might be associated with local GABA inhibitory function.
Finally, we found that AEC measures exhibited an age-related increase in connectivity within all frequency bands across all brain networks, apart from the visual network which showed a decrease in connectivity for the high alpha and high beta frequency bands. These results align well with a recent study where Larivière et al. reported lower beta-band connectivity in the visual network and higher beta-band connectivity in all other brain networks with age (Larivière et al., 2019). Higher functional connectivity in older adults has been associated with a lower cognitive reserve (López et al., 2014), and individuals with mild cognitive impairment exhibit an enhancement of the strength of functional connections (Buldú et al., 2011). Overall, the results from these studies suggest that the age-related increase in functional connectivity, as seen in our study, may play a role in modulating cognitive resources and therefore represent a marker of the decline in cognitive functions observed during aging.
4.4 Limitations
A limitation of brain age prediction is the use of chronological age as a surrogate for brain age. Although we used a cohort of healthy subjects, the brain age is known to depend on various other factors, such as education (Steffener et al., 2016a). In this work, we ignored all lifestyle factors and aimed to predict the biological age from neuroimaging features. Further, we used a single model to predict the brain age for both males and females. These factors contribute to the biological age labels being noisy version of the “true” brain age of each subject.
A major drawback of using GPR as a regression model is that the prediction is reasonably good only for the age range of the training data. Further, our cohort had less subjects in the lower and higher age ranges. Therefore, the model returned biased age predictions for the youngest and oldest subjects. This is illustrated in Fig. 6, which shows the plot of predicted age vs. chronological age for the CCA+GPR model using all neuroimaging features. The predictions for the youngest subjects were higher than the chronological age, whereas the trend is reversed for the oldest subjects. To reduce the prediction bias of the model, future work could include the estimation methods proposed by (Smith et al., 2019) to optimally calculate the difference between the chronological and biological age.
5 Conclusions
We used a combination of structural and functional brain information to predict brain age in a cohort of healthy subjects, which yielded slightly better performance than using a single neuroimaging modality. We showed that dimensionality reduction techniques can be used to improve brain age prediction and identify key neuroimaging features that show age-related effects. Specifically, we found that using CCA in conjunction with GPR yielded the best model for age prediction, whereas PCA degraded prediction performance. We also showed that the most reliable predictors of age-related effects were the MRI features from subcortical structures such as the putamen, thalamus, and caudate nucleus, and WM regions such as the corpus callosum.
Acknowledgments
We wish to thank Drs. Stefanie Blain-Moraes and Karim Jerbi for valuable discussions regarding this work. Data collection and sharing for this project was provided by the Cambridge Centre for Ageing and Neuroscience (Cam-CAN). Cam-CAN funding was provided by the UK Biotechnology and Biological Sciences Research Council (grant number BB/H008217/1), together with support from the Medical Research Council (MRC) Cognition & Brain Sciences Unit (CBU) and the European Union Horizon 2020 LifeBrain project. This work was supported by funds from the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grants RGPIN-2017-05270 [MHB] and RGPIN-2019-06638 [GDM], the Fonds de la Recherche du Quebec - Nature et Technologies (FRQNT) Team Grant 254680-2018 [GDM], and scholarships from McGill University and Quebec Bio-Imaging Network [AXP & AG] and MITACS [AG]. The research was undertaken thanks in part to funding from the Canada First Research Excellence Fund, awarded to McGill University as part of the Healthy Brains for Healthy Lives initiative.