Abstract
Introduction Volumetric estimates of subcortical and cortical structures, extracted from T1-weighted MRIs, are widely used in many clinical and research applications. Here, we investigate the impact of the presence of white matter hyperintensities (WMHs) on FreeSurfer grey matter (GM) structure volumes and its possible bias on functional relationships.
Methods T1-weighted images from 1077 participants (4321 timepoints) from the Alzheimer’s Disease Neuroimaging Initiative were processed with FreeSurfer version 6.0.0. WMHs were segmented using a previously validated algorithm on either T2-weighted or Fluid-attenuated inversion recovery (FLAIR) images. Mixed effects models were used to assess the relationships between overlapping WMHs and GM structure volumes and overal WMH burden, as well as to investigate whether such overlaps impact associations with age, diagnosis, and cognitive performance.
Results Participants with higher WMH volumes had higher overalps with GM volumes of bilateral caudate, cerebral cortex, putamen, thalamus, pallidum, and accumbens areas (P < 0.0001). When not corrected for WMHs, caudate volumes increased with age (P < 0.0001) and were not different between cognitively healthy individuals and age-matched probable Alzheimer’s disease patients. After correcting for WMHs, caudate volumes decreased with age (P < 0.0001), and Alzheimer’s disease patients had lower caudate volumes than cognitively healthy individuals (P < 0.01). Uncorrected caudate volume was not associated with ADAS13 scores, whereas corrected lower caudate volumes were significantly associated with poorer cognitive performance (P < 0.0001).
Conclusions Presence of WMHs leads to systematic inaccuracies in GM segmentations, particularly for the caudate, which can also change clinical associations. While specifically measured for the Freesurfer toolkit, this problem likely affects other algorithms.
INTRODUCTION
White matter hyperintenesities (WMHs) are defined as areas of increased signal on T2-weighted (T2w) and Fluid-attenuated inversion recovery (FLAIR) magnetic resonance images (MRIs) (Raman et al., 2016). WMHs are associated with a variety of underlying pathologies, such as amyloid angiopathy, arteriosclerosis, axonal loss, blood-brain barrier leakage, degeneration, demyelination, gliosis, hypoperfusion, hypoxia, and inflammation (Abraham et al., 2016). WMHs are commonly present in the otherwise asymptomatic aging population but are at higher prevalence in many diseases such as Alzheimer’s disease (AD), diabetes, frontotemporal dementia, HIV, lewy body dementia, mild cognitive impairment (MCI), obesity, Parkinson’s disease, and vascular dementia (Appelman et al., 2009; Debette and Markus, 2010; Caroppo et al., 2014; Dadar et al., 2018b; Gouw et al., 2008; Kandiah et al., 2013; Sudre et al., 2017; Barber et al., 1999).
On T1-weighted (T1w) MRI sequences, WMHs appear hypointense with respect to the normal-appearing white matter, with intensities that can be very similar to cortical and subcortical grey matter (GM) (Dadar et al., 2017a). The T1w intensity of WMHs is also associated with severity of damage to the tissue, with areas of higher damage appearing more hypointense (Dadar et al., 2019).
As T1w images are the most commonly used structural MRI sequences in clinical and neuroscience applications, especially for purposes of segmentation and estimation of volumes for all or specific structures of interest (Mateos-Pérez et al., 2018), the similarity in T1w intensity profiles of WMHs and GM gives rise to an important methodological question: can T1w MRI-based GM structure segmentation differentiate between WMHs and GM? If not, how much of WMHs will be tagged as GM in their segmentation estimates, and if so, is this error systematic enough to bias results?
To answer this question, we propose our study of WMHs segmentation bias in subcortical and cortical GM structures. More specifically, we investigated 1) whether there was any systematic overlap between WMHs and GM segmentations, and therefore volumetric biases; and 2) whether this overlap affected clinical findings (i.e. associations with cognitive scores). To these ends we used two segmentation tools, the first being FreeSurfer, one of the most commonly used publicly available brain segmentation tools (Fischl, 2012), and a previously validated tool for WMHs segmentation on multi-contrast MRIs (Dadar et al., 2017b), both applied on longitudinal data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database.
METHODS
Participants
We used longitudinal data from 1077 participants (4321 individual images from various timepoints) from the ADNI-1, ADNI-2, and ADNI-GO database (adni.loni.usc.edu) that had T1w and either T2w/PDw or FLAIR MRIs available. The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial MRI, positron emission tomography, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD. The study was approved by the institutional review board of all participating sites and written informed consent was obtained from all participants before inclusion in the study.
MRI acquisition and preprocessing
Table 1 summarizes MR imaging parameters for the data used in this study.
Scanner information and MRI acquisition parameters for ADNI1, and ADNI2/GO datasets.
GM Segmentations
All T1w images were identically processed using FreeSurfer version 6.0.0 (recon-all-all). FreeSurfer is an open source software (https://surfer.nmr.mgh.harvard.edu/) that provides a full processing stream for structural T1w data (Fischl, 2012). The final segmentation output (aseg.mgz) was then used to obtain structure masks and volumes based on the look up table available at https://surfer.nmr.mgh.harvard.edu/fswiki/FsTutorial/AnatomicalROI/FreeSurferColorLUT.
WMH Segmentations
T1w, T2w/PDw, and FLAIR scans were pre-processed as follows: (a) image denoising (Manjón et al., 2010); (b) intensity inhomogeneity correction; and (c) intensity normalization to a 0-100 range. For each subject, the T2w, PDw, or FLAIR scans were then co-registered to the structural T1w scan of the same timepoint using a 6-parameter rigid registration and a mutual information objective function (Dadar et al., 2018a). We used a previously validated WMH segmentation method that employs a set of location and intensity features in combination with a random forests classifier to detect WMHs using either T1w+FLAIR or T1w+T2w/PDw images. The training library consisted in manual, expert segmentations of WMHs from 100 subjects from ADNI (not included in the current sample) (Dadar et al., 2017c, 2017b, 2018b). WMHs were automatically segmented at all timepoints and then co-registered to the T1w images using the obtained rigid registrations, in order to assess overlaps between WMHs and GM segmentations.
Cognitive Evaluations
All subjects received a comprehensive battery of clinical assessments and cognitive testing based on a standardized protocol (adni.loni.usc.edu) (Petersen et al., 2010). At each visit, participants underwent a series of assessments including the Alzheimer’s Disease Assessment Scale-13 (ADAS13) (Mohs and Cohen, 1987), which was used to assess cognitive performance.
Statistical Analyses
Overlaps between GM and WMH segmentations were calculated (number of overlapping voxels in mm3) for each subcortical and cortical GM region. The following mixed effects models were used to assess whether the WMH-GM overlaps were associated with overall WMH burden, controlling for age and sex.
Mixed effects models were also used to assess the relationships between GM volumes and age and diagnostic cohort, and GM volumes and cognition, once using the GM volume estimates obtained directly from the FreeSurfer segmentation, and once after removing the regions overlapping with the WMH segmentations.
All volumes were normalized by the individual’s intracranial volume. Total WMH loads and WMH-GM overlaps were log-transformed to obtain normal distributions. All mixed effects models included Subject as well as Scanner Model and Field Strength as categorical random variables, to account for any potential variabilities caused by contrast differences in the images from different scanners. The results were corrected for multiple comparisons using the false discovery rate (FDR) controlling method with a significance threshold of P = 0.05 (Benjamini and Hochberg, 1995).
Data and Code Availability Statement
Data used in this article is available at http://adni.loni.usc.edu/. FreeSurfer and the WMH segmentation pipeline used are also publicly available at https://surfer.nmr.mgh.harvard.edu/ and http://nist.mni.mcgill.ca/?p=221, respectively.
RESULTS
Study participants
Preprocessed and registered images were visually assessed for quality control (presence of imaging artifacts, failure in registrations). WMH segmentations were also visually assessed for missing hyperintensities or over-segmentation. Either failures resulted in the participant being removed from the analyses. All MRI processing, segmentation and quality control steps were blinded to clinical outcomes. All cases passed co-registration QC. Figure 1 summarizes the QC information for the subjects that were excluded. The final sample included 1077 participants (4321 timpoints) with WMH and FreeSurfer segmentations.
Flowchart of subjects in the study.
Segmentations
Figure 2 shows WMH and caudate segmentations for three subjects with no overlap, some overlap, and high overlap as an example.
Overlap between WMH and caudate segmentations. WMH= White Matter Hyperintensity. Blue= Caudate. Green= WMHs. Red= The overlapping voxels between caudate and WMH segmentations.
Segmentation overlap
Table 2 shows the average amount of overlap between WMHs and FreeSurfer segmentations for each GM structure, as well as their association with the overall WMH burden, controlling for age and sex (eq.1). Caudate segmentations had by far the highest percentage of overlapping WMHs (6% of the mean caudate volume). The overlapping volumes were significantly related to the overall WMH burden for bilateral caudate, cerebral cortex, putamen, thalamus, pallidum, accumbens area, and the right hippocampus (P<0.0002). Figure 2 shows the associations for the top six regions.
Overlaps between WMHs and FreeSurfer GM segmentations, and their associations with overal WMH burden. The regions are sorted based on effect size. Significant results after FDR correction are indicated in bold font.
To better demonstrate how much the misclassification of WMHs as caudate might increase caudate volume estimates, we have also plotted volume overlaps for caudate without log-transformation (i.e. volumes in mm3) in Figure 4. In extreme cases, for individuals with very high WMH burden (log transformed value of 11, equivalent to 50,000 mm3), caudate volume can be over-estimated by more than 1000 mm3, equivalent to 30% of the average caudate volume.
The association between overlapping GM and WMH volumes and overall WMH burden (Table 2).
The association between overlapping Caudate and WMH volumes (in mm3) and overall WMH burden.
Associations
Table 3 shows the associations between GM volumes and age (eq. 2), before and after removing the voxels overlapping with WMHs. Uncorrected caudate volumes increased with age, and MCI and AD groups had slightly higher volumes than the normal aging (NA) cohort (although not significant), whereas the corrected caudate volumes decreased with age, and MCI and AD groups had slightly lower volumes than the NA group (the AD vs NA difference was significant). The uncorrected and corrected results were similar in terms of effect size and direction of associations for other regions, with the corrected volumes having slightly larger effects sizes for putamen.
Associations between uncorrected and corrected GM volumes, age, and diagnostic cohort. Significant results after FDR correction are indicated in bold font.
Table 4 shows the associations between ADAS13 and GM volumes, before and after removing the voxels overlapping with WMHs. Uncorrected caudate volumes were not significantly associated with ADAS13 scores, whereas lower corrected caudate volumes were significantly associated with higher ADAS13 scores (i.e. poorer cognitive performance). Figure 5 shows the associations between uncorrected and corrected caudate volumes and ADAS13 scores. The uncorrected and corrected results were similar in terms of effect size and direction of associations for other regions, with the corrected volumes having slightly larger effects sizes for putamen.
Associations between uncorrected and corrected GM volumes and ADAS13. Significant results after FDR correction are indicated in bold font.
The association between uncorrected and corrected Caudate volumes ADAS13 scores.
DISCUSSION
In this study, we investigated the impact of WMHs on GM segmentations, specifically for the Freesurfer segmentation tool, in order to determine whether WMHs might lead to systematic segmentation errors in certain GM regions, and whether such errors might impact associations between GM volumes and clinical outcomes. We found such errors in a number of regions, and in the caudate in particular it propagated to the association with clinical variables.
We found that overlapping voxel volumes between WMH and GM segmentations were significantly associated with overal WMH burden (Table 2), indicating higher error rates for subjects with high WMH loads. This affected both cortical and subcortical structures, and in particular the caudates bilaterally. Uncorrected, caudate volumes showed a significant increase with age, which is highly unlikely to be a real effect given that all regions, except the caudate, have been shown to decline in late-life cognitively healthy individuals (Potvin et al., 2016b, 2016a). This was further improbable given that the population of this study consists of not only aging individuals but also patients with MCI and AD, which are known to have increasing levels of atrophy across cortical areas. In contrast, the corrected caudate volumes significantly decreased with age, as could be expected. Given that WMHs are highly prevalent in the periventricular regions (i.e. white matter areas surrounding the caudate), the significant increase estimate is likely due to the fact that WMH burden increases with age and AD and MCI patients tend to have higher WMH loads and faster WMH progression.
Along the same line, corrected caudate volumes showed significant differences between AD and cognitively healthy cohorts, as well as significant associations with cognitive performance in the expected directions, whereas the uncorrected volumes were not significantly different and in the opposite direction. This again highlights the fact that if uncorrected for overlapping WMHs, estimates of caudate volumes can lead to incorrect or unreliable findings, particularly in populations with high prevalence of WMHs.
While other GM regions also had overlaps with WMHs that were significantly associated with the overal WMH burden (e.g. cerebral cortex, putamen), the amount and percentage of these overlaps were not nearly as high as caudate (17, 22, 80, and 74 mm3 versus 217 and 223 mm3 or 0.4%, 0.5%, 0.04%, and 0.04% versus 6.1% and 6.5%), and therefore, they did not affect their overall estimates and associations with age, diagnosis, and ADAS13. However, this might not be the case in other populations, where WMHs are more prevalent, or have different distributions.
FreeSurfer is one of the most widely used publicly available brain segmentation tools. Large databases such as the UK Biobank provide GM structure volumes derived from FreeSurfer to researchers. In line with our findings, other researchers have reported a positive association between WMH load and FreeSurfer caudate volumes in the UK biobank participants (Morys et al., 2020) and in another large sample of cognitively healthy individuals (Potvin et al., 2017a, 2017b). Studies investigating a larger age range report a U-shape curve for caudate volumes, decreasing from early adulthood to the 60s, and then increasing afterwards (Fjell et al., 2013, 2009; Goodro et al., 2012; Pfefferbaum et al., 2013; Potvin et al., 2016b; Walhovd et al., 2011). Given that WMHs generally occur in this same age range (i.e. after 60s), these results are also likely due to the segmentation errors caused by presence of WMHs in older participants.
Specifically for this algorithm, our study emphasizes the need for correcting FreeSurfer GM volume estimates for WMHs, particularly for the caudate. However, it is likely that other algorithms exhibit the same behavior. Evidence can be found in the literature, for example in the works by Goodro et al. and Pfefferbaum et al. using FSL volumes (Goodro et al., 2012; Pfefferbaum et al., 2013). Developers and users alike should therefore be aware of the possibility of systematic bias from not taking into account WMHs in GM segmentation from T1w images.
In conclusion, the presence of WMHs can lead to systematic errors in GM segmentations in certain regions, particularly in the caudate, which, if not corrected, can impact findings in populations with high WMH prevalence.
Acknowledgements
MD is supported by a scholarship from the Canadian Consortium on Neurodegeneration in Aging in which SD and RC are co-investigators. The Consortium is supported by a grant from the Canadian Institutes of Health Research with funding from several partners including the Alzheimer Society of Canada, Sanofi, and Women’s Brain Health Initiative.
Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
Footnotes
* Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-ontent/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf