Brain Age Prediction: Cortical and Subcortical Shape Covariation in the Developing Human Brain

Cortical development is characterized by distinct spatial and temporal patterns of maturational changes across various cortical shape measures. There is a growing interest in summarizing complex developmental patterns into a single index, which can be used to characterize an individual’s brain age. We conducted this study with two primary aims. First, we sought to quantify covariation patterns for a variety of cortical shape measures, including cortical thickness, gray matter volume, surface area, mean curvature, and travel depth, as well as white matter volume, and subcortical gray matter volume. We examined these measures in a sample of 869 participants aged 5-18 from the Healthy Brain Network (HBN) neurodevelopmental cohort using the Joint and Individual Variation Explained (Lock et al., 2013) method. We validated our results in an independent dataset from the Nathan Kline Institute - Rockland Sample (NKI-RS; N=210) and found remarkable consistency for some covariation patterns. Second, we assessed whether covariation patterns in the brain can be used to accurately predict a person’s chronological age. Using ridge regression, we showed that covariation patterns can predict chronological age with high accuracy, reflected by our ability to cross-validate our model in an independent sample with a correlation coefficient of 0.84 between chronologic and predicted age. These covariation patterns also predicted sex with high accuracy (AUC=0.85), and explained a substantial portion of variation in full scale intelligence quotient (R2=0.10). In summary, we found significant covariation across different cortical shape measures and subcortical gray matter volumes. In addition, each shape measure exhibited distinct covariations that could not be accounted for by other shape measures. These covariation patterns accurately predicted chronological age, sex and general cognitive ability. In a subset of NKI-RS, test-retest (<1 month apart, N=120) and longitudinal scans (1.22 ± 0.29 years apart, N=77) were available, allowing us to demonstrate high reliability for the prediction models obtained and the ability to detect subtle differences in the longitudinal scan interval among participants (median and median absolute deviation of absolute differences between predicted age difference and real age difference = 0.53 ± 0.47 years, r=0.24, p-value=0.04).


Introduction
The human brain undergoes dramatic structural changes throughout the lifespan, including brain maturation processes from childhood to adolescence and young adulthood. Characterizing typical developmental trajectories of brain structure maturation is essential for understanding neurodevelopmental disorders and vulnerabilities of the brain (Gogtay et al., 2004;Toga et al., 2006). Most studies of brain development have focused on the cortex and subcortical structures. For instance, developmental trajectories of cortical thickness, cortical and subcortical gray matter (GM) volume, surface area, mean curvature, and white matter (WM) development in different age groups have been extensively examined using both cross-sectional (Asato et al., 2010;Barnea-Goraly et al., 2005;Hill et al., 2010) and longitudinal data (Giedd et al., 1999;Gogtay et al., 2004;Lenroot et al., 2007;Mensen et al., 2017;Sowell et al., 2004). It is well recognized that cortical development is spatially heterogeneous with individual brain regions following distinct temporal patterns of maturational changes (Toga et al., 2006). To best characterize the complexity of cortical development across different brain regions, investigators have begun to use brain structural features to derive a comprehensive index of brain development (Cole and Franke, 2017;Erus et al., 2015).
Brain-based age prediction is thought to be important for understanding normal brain developmental process in healthy individuals, as well as atypical brain structural/functional developmental patterns that might have clinical implications (Davatzikos et al., 2009;Dosenbach et al., 2010). Among brain structural features, cortical thickness (Khundrakpam et al., 2015;Lewis et al., 2018) and GM volume (Franke et al., 2010) have been used to predict brain age. An increasingly broad range of possible measures exist for characterizing brain cortical structures, including surface area, mean curvature, travel depth, and WM volume; additionally, some studies have used the GM/WM difference or contrast (Franke et al., 2012;Lewis et al., 2018) or developed composite metrics incorporating multiple features to increase the brain age prediction accuracy (Brown et al., 2012;Erus et al., 2015;Lewis et al., 2018).
However, it remains unclear to what extent different cortical shape measures covary and whether their covariation patterns are related to chronological age.
To characterize covariation patterns across different cortical and subcortical measures, we use the Joint and Individual Variation Explained (JIVE) method (Lock et al., 2013) to estimate shared (i.e., covariation patterns common across all shape measures) and distinct (i.e., covariation patterns specific to an individual shape measure) covariation patterns across six cortical shape measures (cortical thickness, GM volume, surface area, mean curvature, travel depth, WM volume) and subcortical GM volumes. JIVE was developed for simultaneously identifying consistent patterns across multiple data types and patterns unique to individual data types. Specifically, JIVE decomposes the total variance in data into three terms: joint variation across all seven shape measures, structured variation unique to individual shape measures, and residual noise that should be discarded from analyses. JIVE has been used in cancer studies to identify genetic variants across different data platforms (e.g., genotyping, mRNA and miRNA expression) (Hellton and Thoresen, 2016;O'Connell and Lock, 2016) and to identify the common variance between task-based fMRI connectivity and a wide range of behavioral measures (Yu et al., 2017). We hypothesized that integrative analysis of different cortical and subcortical shape measures can provide a more comprehensive picture of brain cortical and subcortical development from childhood to early adulthood, and thus may generate insights into the relationship between brain maturation and cognitive development.
The main purposes of this study were to investigate (1) whether there are common and distinct covariation patterns across a set of the aforementioned six brain cortical shape measures along with subcortical GM volume, and (2) whether those common and distinct covariation patterns can accurately predict age using ridge regression (Hoerl and Kennard, 1970). Due to the impact of sex on brain structural differences and the relationships between cognitive function and brain structure (Erus et al., 2015;Kaczkurkin et al., 2019;Ruigrok et al., 2014;Schmithorst, 2009), we also assessed whether those JIVE components could predict sex and full scale intelligence quotient (FSIQ). To determine whether this approach is generalizable, we validated both our JIVE analysis and age prediction results in an independent dataset. Importantly, this replication involved the application of the JIVE components and prediction models derived in one sample (i.e., CMI Healthy Brain Network) to an independent sample that employed distinct imaging protocols and recruitment strategies.

Datasets
Detailed descriptions about the participants and neuroimaging data used in the study are outlined below. All brain features used in this study were extracted from T1-weighted (MPRAGE) scans.

Training data: The Healthy Brain Network cohort
To characterize covariations common across and/or distinct within different structural shape measures, we used T1-weighted MRI scans from the Healthy Brain Network (HBN) project (Alexander et al., 2017). The HBN is an ongoing initiative focused on creating and sharing a biobank comprised of data from up to 10,000 New York City area children and adolescents (https://healthybrainnetwork.org/). A self-referred community sampling strategy is used, with more than 80% of children being identified with one or more mental health or learning disorders.
Our analyses were based on N = 869 individuals (530 male, 339 female, mean age = 10.54 ± 3.26, age range: 5. mean FSIQ: 98.81 ± 16.42; collected from three imaging sites used by the HBN. Structural MRIs were acquired at either 1.5T (mobile MRI Scanner in Staten Island, N=630) or 3T (Siemens Tim Trio at Rutgers University, N=11; Siemens Prisma at Cornell Weill Medical College, N=228) using standard T1weighted sequences. Participants with serious neurological disorders (e.g., Huntington's disease, amyotrophic lateral sclerosis, multiple sclerosis, cerebral palsy), acute brain dysfunction, diagnosis of schizophrenia, schizoaffective disorder, bipolar disorder, manic or psychotic episode within the past six months, or history of lifetime substance dependence requiring chemical replacement therapy were excluded from the study.

Independent validation data: the NKI-RS longitudinal sample
T1-weighted MRI scans from the Nathan Kline Institute -Rockland Sample (NKI-RS) (Nooner et al., 2012) child longitudinal study (http://rocklandsample.org/child-longitudinal-study) were used as our independent validation data. The child longitudinal study is an ongoing project aimed to generate and share a large-scale, community-ascertained longitudinal sample for understanding comprehensive growth curves of brain function and structure. Per study protocol, T1-weighted MPRAGE scans (TR = 1900; voxel size = 1mm isotropic) are collected for each participant at up to three time points. We had access to the baseline data of N = 210 participants (122 male, 88 female, mean age = 12.31 ± 3.06, age range: 6.68 -17.94 years; mean FSIQ: 105.00 ± 13.49; FSIQ range: 77 -142). Recruitment of study participants was designed to maximize community representativeness. Among those with the baseline data, 120 subjects had retest imaging data obtained 3~4 weeks later, and 77 subjects had longitudinal follow-up scans (mean inter-scan interval 1.22 ± 0.29, range 0.31 -1.92 years).

MRI preprocessing
MRI preprocessing entailed two complementary processes: standard MRI preprocessing by the FreeSurfer (Dale et al., 1999) software package (https://surfer.nmr.mgh.harvard.edu) with subsequent feature extraction and shape analysis by the Mindboggle software package (Klein et al., 2017) (https://mindboggle.info). Briefly, FreeSurfer's recon-all pipeline (v.5.3.0) performed motion correction, intensity normalization, skull stripping, subcortical segmentation, tissue classification, and surface extraction. Preprocessing quality was assessed by a number of standard quality control measurements (Shehzad et al., 2015). MRI data failing quality control analyses were removed from further analysis. The preprocessed T1-weighted MRI data were then imported into Mindboggle to extract shape measures for further analysis. The technical details of Mindboggle have been described elsewhere (Klein et al., 2017). Mindboggle has been shown to improve brain region labeling and provides a variety of precise brain shape estimates.

Brain shape measures
We aimed to characterize covariations among different shape measures for cortical surface and subcortical regions. We focused on six measures, including cortical mean curvature, white matter volume, cortical thickness, surface area, gray matter volume, and travel depth for 62 regions of interest (ROIs) (31 ROIs per hemisphere) per the Desikan-Killiany-Tourville (DKT) protocol (Klein and Tourville, 2012). For a given ROI, we used the median shape value across all vertices in that ROI. Additionally, we included gray matter volume estimates for 16 subcortical ROIs. Thus, the shape measures in this study can be regarded as data from seven different sources.

Dimension reduction by Joint and Independent Variance Explained (JIVE)
Investigating each brain shape individually might fail to identify important interrelationships among different shape measures. We hypothesized the existence of hidden covariation patterns shared across and/or specific to individual shape measures. We therefore used JIVE (Lock et al., 2013) to identify such hidden patterns that might be of biological interest. JIVE extends Principal Component Analysis to data from multiple sources (e.g., different shape measures) and was developed primarily for discovering shared covariation patterns (i.e., joint components) in different sets of measures, as well as systematic variations (i.e., individual components) specific to an individual data source. Specifically, we denote each shape measure by single ROI and each column representing a participant. Since different shape measures have different biological implications and they are at different scales, we regard those as data from different sources. Following Lock et al. (Lock et al., 2013), JIVE decomposes total variation in data into three major parts (joint components, individual components, and error terms): where : × and 566 = 7

Age, sex and IQ prediction by ridge regression
We used ridge regression models (Hoerl and Kennard, 1970) to assess whether the joint and/or individual components are predictive of age, sex, or FSIQ. Specifically, using JIVE joint and/or individual components as input data, ridge regression models were used to predict chronological age and FSIQ, while ridge logistic regression models were used to predict sex. As a control, we also considered the predictive power of total volumes of subcortical gray matter, intracranium, brain stem, and CSF, which are known to be related to age and sex. Therefore, the input predictors took seven different formats: total volumes only, joint components only, individual components only, total volumes + joint components, total volumes + individual components, joint components + individual components, and total volumes + joint components + individual components.
Briefly, ridge regression models are linear models with ridge penalty which has been shown to offer good prediction performance with high dimensional correlated input data. In mathematic format, given a continuous response variable $ , and a set of predictors $C , ridge regression estimates the parameters C by minimizing ∑ E $ − ∑ C $C C G ? H $<I + ∑ C ? C . The tuning parameter controls the model's complexity. If = 0, ridge regression becomes a traditional linear regression model. The optimal choice of the parameter in this study was based on 10fold cross validation.

Assessment of prediction accuracy
Age and FSIQ prediction accuracy were assessed using two criteria: Mean Absolute Error (MAE) and the coefficient of determination (R 2 ), where R 2 is the proportion of the variance in the dependent variable explained by the model. To determine whether adding additional predictors would improve model prediction accuracy, we report the adjusted R 2 (i.e., R 2 adjusted for the number of predictors in the model) for the training model based on HBN data. Sex prediction accuracy was assessed using area under receiver operating characteristic curve (AUC). To demonstrate how the models generalize to data from another study, we predicted age, FSIQ, and sex in the NKI-RS with models trained on HBN data. We report model prediction accuracy assessments from both HBN and NKI-RS data.

Variance decompositions of shape measures across brain regions
JIVE analysis of 388 brain features (6 cortical shape measures for each of 62 ROIs plus 16 subcortical GM volumes) from the HBN data resulted in a total of 35 lower dimensional representations of brain features (i.e., components). These included 2 joint components, 7, 4, 6, 5, 6, 2, and 3 individual components specific to mean curvature, WM volume, cortical thickness, surface area, GM volume, travel depth, and subcortical GM volumes, respectively. Figure 1 shows the amount of shared and individual variations among the seven shape measures.
Overall, joint components were responsible for more variation in cortical WM volume (48.3%), GM volume (52.1%), surface area (45.2%), and subcortical GM volume (41.6%) than in cortical thickness (31.7%), travel depth (15.1%), and mean curvature (11.2%). Covariation between cortical GM volume and surface area are expected as GM volume roughly equals the product of surface area and cortical thickness. In addition, each shape measure has a certain amount of structured variation (ranging from 9.8% to 33.0%) that is unrelated to other shape measures. It should also be noted that we found considerable residual noise in travel depth (67.2%), mean curvature (56.4%), cortical thickness (47.6%), and surface area (42.3%).
Of particular interest, we showed that the identified JIVE components were reproducible in an independent dataset. Figure  and gray matter volume (r=0.78), respectively, had correlation coefficients above 0.8 or just below (0.78). Thus, a total of 10 JIVE components replicated well in the NKI-RS dataset. It should be noted that most of the individual components specific to cortical GM volume, cortical thickness, and surface area were less consistent between samples, as evidenced by their relatively weak correlations (r < 0.5) in loadings between HBN and NKI-RS components.
To assess whether the 10 components that replicated in the NKI-RS dataset were related to age, sex, and FSIQ, we employed a linear regression model with each component as the only regressor to predict age, sex, and FSIQ, respectively, using the HBN data (Table 1). We found that half of the reproducible components were related to age, FSIQ, and/or sex. Specifically, joint component 2 and components specific to cortical thickness, subcortical GM volume, and mean curvature explained 29%, 18%, 8%, and 3% of total variation in age, respectively, while joint component 1 by itself explained 6% of total variation in FSIQ and predicted sex with good accuracy (AUC=0.79). These results were validated in the NKI-RS cross-sectional sample.

Brain regions with large loadings in highly reproducible components
To identify what shape measures in which regions could show similar variation patterns across all samples, we examined brain regions and their corresponding shape measures with loading magnitudes greater than 0.1 in the identified components. There are 388 loading estimates in each joint component representing covariation on all seven shape measures across various brain regions. Individual components related to shape measure for cortical and subcortical regions had 62 and 16 loading estimates, respectively. Here, we focused on the two joint components and the component specific to cortical thickness due to their relevance to age, FSIQ, and sex prediction.
A small number of loadings had large magnitudes (defined as loading magnitude greater than 0.1) in both joint component 1 (24 out of 388 loadings, 6.2%) and joint component 2 (27 out of 388 loadings, 7.0%). Specifically, both joint components had large loadings on left and right thalamus proper, ventral diencephalon, and superior frontal WM volumes. Shape measures in those regions predicted age (Radj 2 ranging from 4% to 16%), sex (AUC around 0.70), and FSIQ (Radj 2 ranging from 1% to 3%). Since joint component 1 was largely related to sex and FSIQ while joint component 2 was mostly related to age, brain shape measures with large loadings in both joint components are expected to be related to age, FSIQ, and sex. Regions with large loadings on joint component 1 only (Table 2) were mainly cortical GM volume, surface area, and WM volume in left and right rostral middle frontal and superior frontal, along with GM volume in subcortical regions (caudate, putamen, and hippocampus). Shape measures in those regions individually explained up to 5% of total variation in FSIQ and predicted sex with AUC ranging from 0.6 to 0.74. In contrast, regions with large loadings on joint component 2 only (Table 3) were mostly cortical thickness in the following regions: middle temporal, supramarginal, superior frontal, pars orbitalis, inferior and superior parietal in both hemispheres, left transverse temporal, right lateral orbitofrontal, right rostral anterior cingulate, right precuneus, and right superior temporal. Each shape measure in those regions predicted age (Radj 2 ranging from 3% to 20%). Table 4 summarizes regions with large loadings in the component specific to cortical thickness that by itself explained 18% variation in age. It should be noted that cortical thickness in left and right rostral and caudal anterior cingulate individually explained a decent amount of variation in age (Radj 2 ranging from 5% to 15%).
Our results (Tables 5 -7) revealed that among all 388 shape measures, those that individually explained at least 10% of variation in age were mainly cortical thickness; those that individually explained at least 4% of variation in FSIQ were either gray matter volume or cortical surface area; and those predicted sex with AUC above 0.70 were a mix of white matter volume, gray matter volume, and surface area.

Predicting age, sex, and FSIQ
Next, we aimed to determine the optimal multi-feature prediction models for age, sex and FSIQ.
Analyses showed that the 35 joint and individual components which we had identified in HBN data could accurately predict age. Table 8 summarizes model prediction accuracy measures from all seven sets of input predictors. The model with total volumes and joint and individual components gave the best prediction accuracy in term of age (MAE=1.41 years, Radj 2 =0.71) using the model trained by the HBN data and making prediction in the independent NKI-RS cross-sectional dataset. Prediction accuracy of the same model predicting the HBN data was nearly identical (MAE=1.42 years, Radj 2 =0.70). We note that the joint components and the individual components each made unique contributions in predicting age. Thus, while the joint and independent components together explained 67% of total variation in age, models with the joint components only and with the individual components only explained 31% and 35% of total variation in age, respectively. Although the model with total volumes as the only predictors explained 34% of variation in age, total volumes explained only 3% of variation in age after controlling for the joint and individual components in the model.
The model with the best prediction accuracy for sex (AUC=0.85) and FSIQ (MAE=12.85 points, Radj 2 =0.10) in the HBN data was the one with joint and individual components plus total volumes, and the results were validated in the independent NKI-RS data for both sex (AUC=0.84) and FSIQ (MAE=11.70, R 2 =0.08). Similar to age prediction, the joint components and the individual components made separate contributions in predicting FSIQ. Interestingly, we observed that the age prediction residuals, defined as chronological age -predicted age, were negatively associated with FSIQ ( = −0.11, p-value < 0.001).

Validation by test-retest and longitudinal data
Our model has excellent test-retest reliability. The joint and individual component scores based on the JIVE analysis of the HBN data were calculated for both test and retest data. The predicted brain age and FSIQ were obtained for both test and retest subjects using the prediction models from ridge regression analysis of the HBN data. The intraclass correlation coefficient for measuring agreement of the predicted values between test and retest subjects was 0.97 for age and 0.96 for FSIQ.
Finally, for the longitudinal data available in NKI-RS, we noted that the longitudinal interval between scans was somewhat variable (mean = 1.22 year later ± 0.29). As such, we took the opportunity to test the ability to detect differences in this interval across subjects. We found that the median and median absolute deviation of absolute differences between the real age difference and the predicted age difference were 0.53 and 0.47 years, respectively. A significant positive correlation ( = 0.24, p-value =0.04) was observed between the real age difference and the predicted age difference.

Integrative analysis of disparate data types
A major challenge in the analysis of brain imaging studies is the integration of disparate data types, given the substantial covariation across different shape measures as well as within each shape measure. If such covariation in brain measurements cannot be effectively summarized into a lower dimensional representation of brain structure, then no satisfactory performance can be produced by existing statistical models (Bzdok and Yeo, 2017). This highlights the importance of dimension reduction in analysis of neuroimaging data (Cunningham and Yu, 2014;Zhao and Castellanos, 2016).
Motivated by this, we have extended a novel dimension reduction method, JIVE (Lock et al., 2013), initially developed for integrative analysis of different genetic data types, to brain imaging data analysis. We selected JIVE for two main reasons: simultaneous detection of covariation among different data types, and its ability to extract interpretable brain features. To our knowledge, no prior work has attempted to systematically study covariation patterns and changes on such a comprehensive set of brain structure measures across early developmental ages. Extracting interpretable brain features is particularly important in brain imaging studies where imaging confounds or artifacts, such as scanner differences, can easily dominate the results. JIVE is an extension of Principal Component Analysis for extracting linear features (i.e., components) within data. Therefore, it has the advantage of mapping each component back to the original variables and thus making interpretation possible. Our study shows that JIVE can effectively identify meaningful and highly reproducible common and distinct variations across a number of brain cortical and subcortical structural measures that predict age, sex, and FSIQ.

Highly reproducible common and distinct variations in brain shape measures
Prior research on identifying structural covariation has been primarily focused on either a single structural measure (Mechelli et al., 2005), or on analyzing different structural measures separately (Remer et al., 2017;Wierenga et al., 2014). Individual analyses might fail to capture the critical associations between different structural measures (Lock et al., 2013). In this study, we explored simultaneously the covariation patterns across a range of structural measures and all brain regions examined. We demonstrated covariation patterns that exist across different brain cortical and subcortical structure measures in various brain regions, as well as those that are specific to individual brain shape measures. Joint covariation patterns common to all shape measures and a subset of individual covariation patterns specific to travel depth, subcortical gray matter volume, mean curvature, and cortical thickness were remarkably consistent between HBN and NKI-RS datasets.
A question of interest is whether the reproducible covariation patterns in the present investigation are of biological interest. Interpreting the extracted covariation patterns was challenging and not straight-forward. We may not be able to specify the functionality of each identified covariation pattern, which is partly due to the functionality of different brain cortical and subcortical regions not yet being fully characterized (Mechelli et al., 2005.) As a starting point, we focused on their relevance to age, sex, and FSIQ prediction. We found that joint component 2 by itself explained 29% of total variation in age and that almost all its loadings with magnitude greater than 0.1 were cortical thickness measures mainly from frontal and temporal lobes. The component specific to cortical thickness by itself explained 18% of total variation in age. In addition, most of the shape measures in individual brain regions that explained at least 10% of total variation in age were cortical thickness (Table 5). Taken together, these results highlight the importance of cortical thickness as a sensitive index of brain development (Dosenbach et al., 2010;Khundrakpam et al., 2015). In contrast, joint component 1 was not related to age, but predicted sex with AUC of 0.79 by itself. Joint component 1 confirmed that sex differences are most extreme in subcortical volumes (Ritchie et al., 2018). It also revealed sex differences in three shape measures (white matter volume, gray matter volume, and surface area) in superior frontal and rostral middle frontal regions -two primary dorsolateral prefrontal cortex (dlPFC) regions. Lastly, joint component 1 explained about 6% of total variation in FSIQ. We observed that FSIQ correlated mostly with shape measures in the dlPFC regions and thalamus proper.
Covariation between dlPFC regions and thalamus proper is consistent with the thalamus as a key subcortical relay underlying executive functions, in line with findings from a tractography study (Le Reste et al., 2016). Therefore, although the neurobiological properties for the cortical and subcortical covariation patterns we identified remain to be fully dissected, these covariations can serve as a starting point for characterizing patterns related to brain maturation and cognitive development.
The covariation patterns specific to GM volume, cortical thickness, and mean curvature were in general less consistent between HBN and NKI-RS datasets (Figure 2). One possible explanation is that developmental trajectories of those measures change across different stages of life and their covariation patterns might be heterogeneous and nonlinear. For example, various studies have shown that cortical thickness exhibits an inverted-U trajectory from childhood to adulthood, GM declines in a regionally heterogeneous pattern, and mean curvature follows a combination of linear, quadratic, and logarithmic trajectories (Brain Development Cooperative, 2012; Giedd et al., 1999;Remer et al., 2017;Shaw et al., 2008;Tamnes et al., 2017;Wierenga et al., 2014). Therefore, JIVE, a method mainly used to identify linear covariation patterns, might fail to recognize nonlinear covariation patterns among shape measures. Furthermore, distinct genetic influences on surface area and thickness (Chen et al., 2013;Panizzon et al., 2009) and different evolutionary processes (Geschwind and Rakic, 2013; Stiles and Jernigan, 2010) may introduce complex individual variability, making it difficult to characterize the covariation patterns of those shape measures.

Age, sex and FSIQ prediction
One of the aims of the current study was to test whether common structural covariations and those which are specific within each shape measure can predict biological factors such as age, sex, and FSIQ. Ridge regression models with the identified brain covariation patterns as predictors accurately predicted age. The model with JIVE joint and individual components along with total volumes produced a mean absolute age prediction error of 1.41 years and explained 71% of total variation in age, making it one of the best age prediction models in the recent literature (Brown et al., 2012;Erus et al., 2015;Franke et al., 2012;Khundrakpam et al., 2015;Lewis et al., 2018). We also found that the JIVE joint and individual components together predicted sex with high accuracy (AUC=0.85) and explained 10% of total variation in FSIQ, achieving better prediction results than a recent study using T1 white/gray contrast to predict FSIQ (Lewis et al., 2018). To test the validity of the model's predictive performance, we applied it to an independent dataset (NKI-RS) and the results showed that the model generalized to the new dataset, has excellent test-retest reliability, and was able to capture longitudinal changes.
Finally, there is a growing interest in relating individual differences in brain maturation to cognition (Burgaleta et al., 2014;Erus et al., 2015;Lewis et al., 2018;Shaw et al., 2006). For example, (Erus et al., 2015) and (Lewis et al., 2018) both used a residual-based approach to investigate the relationship between brain development and cognition, where brain development was approximated by the age prediction residuals. Both studies showed a significant relationship between brain development and cognition, but the direction of this relationship was opposite in the two studies. Interestingly, we also found a significant negative correlation between the age prediction residuals and FSIQ, indicating that individuals with brain-based age younger than their chronological age had lower FSIQ. This aligns well with the finding of Erus et al. (Erus et al., 2015), although they assessed cognitive test performance in several domains rather than FSIQ. Nevertheless, we note that prediction residuals explained only 1.3% of total variation in FSIQ, so the statistically significant association between age prediction residuals and FSIQ we observed are not yet clinically important. In addition, interpretation of these associations should be made with caution, as conditions such as diabetes, schizophrenia, and traumatic brain injury have been linked to faster brain aging (Cole et al., 2015;Franke et al., 2013). Still, the directionality of our finding has face validity and it warrants further investigation.

Reproducibility
An important aspect of the present work is the high bar employed for examining reproducibility.
The JIVE components and prediction models were estimated using a sample that was entirely distinct from that used for testing. This is in contrast to combining the samples and using a less stringent cross-validation strategy or developing unique estimates for each sample. The two samples differed with respect to recruitment strategy, study design, the scanners and imaging protocols used; similarities included the principal investigator. We believe future work would merit from raising the bar for reproducibility to such standards as large-scale samples emerge.

Limitations
Studies have reported that cortical structure development is characterized in general by a mix of linear, curvilinear, and quadratic trajectories with region-specific developmental variations (Remer et al., 2017;Tamnes et al., 2017;Wierenga et al., 2014). However, JIVE is limited to detecting linear covariation patterns. Therefore, we were not able to detect nonlinear covariation patterns that may be common and distinct across cortical shape measures.
A second limitation is the cross-sectional nature of most of the data we examined. Covariation patterns captured by JIVE analysis do not account for within-participant variation. Brain structure changes dynamically throughout the lifespan and the rate of change varies by brain structure measures and brain regions (Storsve et al., 2014). Future explorations should include applying this analytical framework to longitudinal studies, evaluating how changes in various brain structure measures covary and whether the covariation patterns are related to developmental behaviors.
Third, there are concerns about the representativeness of the datasets employed. While the NKI-RS is a community-ascertained sample, with 42.4% having mental health disorders, the HBN is a self-referred community sample, with 81.5% of individuals having one or more mental health or learning diagnoses. Given the relatively large difference in effect sizes observed between age effects and those of psychiatric disorders, it is not surprising that performance was high despite the phenotypic variation in the HBN samples. Nonetheless, the heterogeneity may have hindered our ability to optimally detect small effects, such as IQ. As large-scale, representative samples such as NIH ABCD emerge, we expect that there will be opportunities to develop more optimized, representative models.
Lastly, the brain parcellation map used in this study was based on the Desikan-Killiany-Tourville labeling protocol (Klein and Tourville, 2012) which consists of 62 regions. Khundrakpam et al. (Khundrakpam et al., 2015) reported that the spatial scale of brain parcellation increases the accuracy of age prediction by cortical thickness, with the best estimations obtained for spatial resolutions consisting of 2,560 and 10,240 brain parcels. It is likely that JIVE analysis based on data from brain labels at finer resolutions such as HCP-MMP (Glasser et al., 2016) might improve the model prediction accuracy and produce structural covariation patterns with increased biological interpretability.
In summary, we used JIVE to identify the consistent covariation patterns common across as well as specific to different cortical and subcortical shape measures. These covariation patterns can accurately predict age and sex, and they also provide better prediction for FSIQ than the existing literature, while retaining neurobiological interpretability. Most importantly, we validated our results in an independent dataset, suggesting the generalizability of our approach in brain imaging studies.  Hemisphere; L: Left; R: Right * Radj 2 is the adjusted percentage of variation in Age or FSIQ explained by a linear model with the corresponding brain shape measure as the only predictor using HBN as training and NKI-RS as testing data. MAE is the mean absolute error of prediction, and AUC is the area under the ROC curve.     Radj 2 is the adjusted percentage of variation in Age or FSIQ explained by a linear model with the corresponding brain shape measure as the predictor using HBN as training and NKI-RS as testing data. MAE is the mean absolute error of prediction, and AUC is the area under the ROC curve.