Shared and unique brain network features predict cognition, personality and mental health in childhood

The manner through which individual differences in brain network organization track population-level behavioral variability is a fundamental question in systems neuroscience. Recent work suggests that resting-state and task-state functional connectivity can predict specific traits at the individual level. However, the focus of most studies on single behavioral traits has come at the expense of capturing broader relationships across behaviors. Here, we utilized a large-scale dataset of 1858 typically developing children to estimate whole-brain functional network organization that is predictive of individual differences in cognition, impulsivity-related personality, and mental health during rest and task states. Predictive network features were distinct across the broad behavioral domains: cognition, personality and mental health. On the other hand, traits within each behavioral domain were predicted by highly similar network features. This is surprising given decades of research emphasizing that distinct brain networks support different mental processes. Although tasks are known to modulate the functional connectome, we found that predictive network features were similar between resting and task states. Overall, our findings reveal shared brain network features that account for individual variation within broad domains of behavior in childhood, yet are unique to different behavioral domains.


Introduction
A central question in systems neuroscience is how brain network architecture supports the wide repertoire of human behavior across the lifespan . Childhood is a period of rapid neural development and behavioral changes across cognition, personality, and mental health (Steinberg 2005, Casey et al. 2008, Paus et al. 2008 . Consequently, there is particular interest in understanding the nature of brain-behavior relationships instantiated early in the lifespan (Spear 2013, Larsen andLuna 2018) . Here, we utilized a large-scale dataset of typically developing 9-to 10-year-old children (Volkow et al. 2018) to quantitatively characterize functional network organization that supports individual-level prediction of cognition, impulsivity-related personality, and mental health across resting and task states.
Whole-brain connectome-wide neurodevelopmental studies have found associations between resting-state functional network organization and behavioral traits (Satterthwaite et al. 2015, Karcher et al. 2019, Marek et al. 2019, Pornpattananangkul et al. 2019 . However, clinical decisions are made at the individual level (Milham et al. 2017, Bzdok andMeyer-Lindenberg 2018) . As such, there is an increasing shift from associational analyses to individual-level prediction (Dosenbach et al. 2010, Finn et al. 2015, Nostro et al. 2018 . Using machine learning algorithms, we can exploit inter-individual heterogeneity in functional connectomes to make predictions about a single person's behavior (Finn et al. 2015) . Consequently, neurodevelopmental prediction studies have used resting-state functional connectivity (FC) to predict individual differences in cognition (Evans et al. 2015, Sripada et al. 2019, Cui et al. 2020 , impulsivity (Shannon et al. 2011) and autism symptoms (Uddin et al. 2013, Lake et al. 2019) .
Recent studies have further suggested that task-state FC yields better prediction of cognition over resting-FC (Rosenberg et al. 2016, Jiang et al. 2019 , with additional performance improvements from combining task-FC and resting-FC (Elliott et al. 2019, Gao et al. 2019 . The improvements suggest that functional connections predictive of individual-level cognition (i.e., predictive network features) might differ between rest and task states. However, other studies have shown that the brain functional network architecture is broadly similar during rest and task (Smith et al. 2009, Cole, Bassett, et al. 2014) . Indeed, while task contexts reliably modulate functional network organization (Schultz and Cole 2016, Shine et al. 2016, Salehi et al. 2019 , task modulation of the functional connectome within individuals is much smaller than differences between individuals (Gratton et al. 2018) . Therefore, it remains unclear whether predictive network features differ across brain states. This is a central question we seek to address in this study.
Furthermore, most previous connectome-based prediction studies have focused on specific behavioral traits (Rosenberg et al. 2016, Nostro et al. 2018, Wang et al. 2018, Jiang et al. 2019, Lake et al. 2019, Sripada et al. 2019, Cui et al. 2020) . Yet, the human brain has evolved to execute a diverse range of behaviors, so focusing on single behavioral traits might miss the forest for the trees (Holmes and Patrick 2018) . More specifically, it remains unclear whether predictive network features are similar or different across behavioral measures. For example, specialized brain networks support distinct cognitive processes, such as attention, language or attention (Corbetta and Shulman, 2002;Fedorenko and Thompson-Schill 2014;DiNicola et al., 2020). Thus, one might expect distinct network features to support prediction of different cognitive traits. On the other hand, many studies have also emphasized information integration across specialized brain networks (van den Heuvel and Sporns 2011, Cole et al. 2013, Bertolero et al. 2018 . Consequently, one might also expect a common set of predictive network features that explain individual differences in cognition. To systematically revisit the two possible scenarios, we considered the prediction of a large number of behavioral measures. This population neuroscience re-assessment allowed us to estimate the degree of overlap in predictive network features across different behavioral domains (cognition, personality, mental health), as well as across phenotypes within the same behavioral domain.
In the present study, we utilized the Adolescent Brain Cognitive Development (ABCD) study, a unique dataset with a large sample of children and a diverse set of behavioral measures (Volkow et al. 2018) . We used resting-FC and task-FC to predict a wide range of cognitive, impulsivity-related personality, and mental health measures. We also investigated whether combining resting-FC and task-FC can improve behavioral prediction. Most importantly, we explored the existence of shared and unique predictive network features within and across behavioral domains, as well as across brain (resting and task) states.

Results
We used resting-fMRI and task-fMRI from 11875 children (ABCD 2.0.1 release). There were three tasks: monetary incentive delay (MID), stop signal task (SST) and N-Back. We also considered all available dimensional neurocognitive (Luciana et al. 2018) and mental health ) assessments, yielding 16 cognitive, 11 (impulsivity-related) personality and 9 mental health measures. After strict preprocessing quality control (QC) and considering only participants with complete resting-fMRI, task-fMRI and behavioral data, our main analyses utilized data from 1858 unrelated children ( Figure 1A).  (Schaefer et al. 2018) . Parcel colors are assigned according to 17 large-scale networks ) . (C) Nineteen subcortical regions (Fischl et al. 2002) . Panels B and C were reproduced from Orban and colleagues (2020) .

Task-FC outperforms resting-FC for predicting cognition, but not personality or mental health
We computed FC (Pearson's correlations) among the average time courses of 400 cortical (Schaefer et al. 2018) and 19 subcortical (Fischl et al. 2002) regions ( Figures 1B & 1C), yielding a 419 x 419 FC matrix for each brain state (rest, MID, SST, N-back). We used kernel regression to predict each behavioral measure based on resting-FC, MID-FC, SST-FC and N-back-FC separately. We have previously demonstrated that kernel regression is a powerful approach for resting-FC behavioral prediction (He et al. 2020) . The idea behind kernel regression is that subjects with more similar FC matrices would exhibit more similar behavior.
To evaluate the kernel regression performance, we utilized an inner-loop (nested) cross-validation procedure in which participants were repeatedly divided in training and test sets. The regression model was fitted on the training set and used to predict behavior in the test set. Care was taken so that participants from the same site were not split between training and test sets. This cross-validation procedure was repeated 120 times to ensure stability (Varoquaux et al. 2017) . See Methods for more details. Figure 2A shows the prediction performance averaged within each behavioral domain. Each behavioral domain was predicted better than chance (FDR q < 0.05) with p < 0.0005 across all brain states for cognition, (impulsivity-related) personality and mental health respectively. Consistent with previous studies  , we found that MID-FC and N-back-FC outperformed resting-FC (p = 7.08e-08 and p = 4.85e-09 respectively) in predicting cognition. However, SST-FC had worse performance than resting-FC (p = 0.0082). In the case of personality and mental health, there was no statistical difference between resting-FC and any task state. Thus, task-FC appeared to improve prediction performance for cognition, but not personality or mental health.

Figure 2. (A)
Cross-validated prediction performance (Pearson's correlation between observed and predicted values) using kernel ridge regression for resting-state and task-states (MID, SST, N-Back). Multi-kernel FC utilized FC from all 4 brain states for prediction. * denotes above chance prediction after correction for multiple comparisons (FDR q < 0.05). ^ denotes significantly different comparison after correction for multiple comparisons (FDR q < 0.05). The boxplots show the average accuracy across 120 replications. Task-FC appeared to only improve prediction performance for cognition, but not (impulsivity-related) personality or mental health. Multi-kernel FC improved prediction performance for cognition and personality, but not mental health. Similar conclusions were obtained using coefficient of determination (COD) instead of Pearson's correlation as a measure of prediction performance ( Figure S1). MID: monetary incentive delay; SST: stop signal task. (B) The average difference in accuracy (Pearson's correlation between observed and predicted values) between the Multi-kernel FC and N-back models across 120 replications. (C) Mental health measures. * denotes above chance prediction after correcting for multiple comparisons (FDR q < 0.05). The boxplots show the average accuracy across 120 replications. Note the different scales across the three panels. The same set of behavioral measures were predicted better than chance when using coefficient of determination (COD) instead of Pearson's correlation as a measure of prediction performance ( Figure S2).

Combining task-FC and resting-FC improves prediction of cognition and personality, but not mental health
Previous studies have suggested that combining task-FC and resting-FC can improve prediction of fluid intelligence (Elliott et al. 2019, Gao et al. 2019) and reading comprehension (Jiang et al. 2019) . We extended the previous studies by performing multi-kernel ridge regression using resting-FC, MID-FC, SST-FC and N-back-FC jointly to predict a broader range of cognitive measures as well as non-cognitive (personality and mental health) measures. Figure 2 shows the multi-kernel prediction performance averaged within each behavioral domain. Since N-back performed the best among the single-kernel regression for all behavioral domains (Figure 2A), we compared multi-kernel FC with N-back-FC ( Figure 2B). We found that multi-kernel FC performed better than N-back-FC for cognitive (p = 5.27e-06) and personality (p = 0.02), but not mental health (p = 0.12). Figure 3 shows the prediction performance of multi-kernel FC for all individual behaviors. As can be seen, the prediction performance varies widely across behavioral measures. All 16 cognitive and 9 personality measures were significantly predicted better than chance, while 7 out of 11 mental health measures were significantly predicted. On average, across behavioral measures that were predicted better than chance, the correlation between observed and predicted values for cognition was 0.316 ± 0.126 (mean ± std), personality was 0.103 ± 0.044 and mental health was 0.120 ± 0.064.
Thus, prediction performance was better for cognition than personality or mental health. For example, the best predicted cognitive measure was crystallized cognition with an accuracy of r = 0.530, while the best predicted personality measure was positive urgency with an accuracy of 0.143 and the best predicted mental health measure was total psychosis symptoms with an accuracy of 0.184. Henceforth, we will focus on the 32 behavioral measures that were significantly predicted by multi-kernel FC.
we were able to explore the question of whether predictive brain network features were shared or unique across behavioral measures. The multi-kernel regression models were inverted (Haufe et al. 2014) , yielding a 419 x 419 predictive-feature matrix for each brain state (rest, MID, SST, N-back) and each behavioral measure. Haufe's inversion approach yields a positive (or negative) predictive-feature value for an edge, indicating that higher FC for the edge was associated with predicting greater (or lower) behavioral values. Figure 4 shows the predictive-feature matrices for positive urgency and negative urgency across all brain states. All predictive-feature matrices can be found in Figures S3 to S6.
This inversion process is critical to interpreting supervised prediction models. Most previous studies have either interpreted the model weights or selected features, which leads to less interpretable results that are sensitive to the choice of regression models (Haufe et al., 2014). As will be shown in additional control analyses, we showed that the predictive features were highly robust across regression models, underlining the importance of this inversion process.
As can be seen in Figure 4, the predictive features were very similar between positive urgency and negative urgency for each brain state. The predictive features were also similar across brain states, but to a lower extent than the between-behavior similarity. To more quantitatively explore these phenomena, we first investigated whether predictive network features were similar across behavioral measures. Predictive-feature matrices for each behavioral measure were concatenated across brain states and correlated between behaviors, yielding a 32 x 32 matrix shown in Figure 5A. Here, the behavioral measures are ordered based on ABCD's classification of these measures into cognition, personality and mental health behavioral domains, so we referred to this ordering as "hypothesis-driven". If a pair of behavioral measures exhibited a high value (green) in the matrix ( Figure 5A), then this indicates that the two behavioral measures are predicted by highly similar network features. As can be seen, the predictive-feature matrices were much more similar within each behavioral domain than across behavioral domains ( Figure 5A).
Instead of ordering the behavioral measures in a hypothesis-driven fashion ( Figure 5A), we also re-ordered the behavioral measures by hierarchical clustering of the predictive-feature matrices ( Figure 6A). The hierarchical clustering yielded three data-driven behavioral clusters ( Figure 6A) that were highly similar to the hypothesis-driven behavioral domains ( Figure 5A). We again see that the predictive-feature matrices were much more similar within each data-driven behavioral domain than across domains We then tested whether predictive network features were similar across brain states. Since predictive-feature matrices were similar within each behavioral domain ( Figure 5A), we averaged the predictive-feature matrices across behaviors, yielding a predictive-feature matrix for each behavioral domain and each brain state ( Figure S7). The 12 predictive-feature matrices were then correlated across behavioral domains and brain states. The predictive-feature matrices were similar across brain states within each behavioral domain (especially in the case of personality and mental health) ( Figure 5B). Performing the same analyses using the data-driven behavioral clusters (Figures 6 & S8) yielded similar results.
Overall, these results suggest that predictive network features were more similar within behavioral domains (cognition, personality, mental health) than across behavioral domains. Furthermore predictive network features were similar across brain states. Critically, the similarity in predictive network features cannot be completely explained by similarity among the actual behavioral measures themselves ( Figure S9). For example, "lacking of planning" and "sensation seeking" shared predictive features with cognitive measures ( Figure 6A), although the behavioral measures themselves were more correlated with other mental health and personality measures ( Figure S9). As another example, the average correlation of predictive network features across cognitive measures was 0.68 ± 0.19 (mean ± std), while the correlation among the raw cognitive scores was 0.29 ± 0.22. Figure 4. Predictive network features for positive urgency and negative urgency across all brain states. Haufe's approach was utilized to invert the kernel regression models (Haufe et al. 2014) , which allowed us to interpret which features were important for predicting a particular behavior. A positive (or negative) predictive-feature value indicates that higher FC was associated with predicting greater (or lower) behavioral values. As can be seen, the predictive features were similar between positive urgency and negative urgency across all brain states (although there were also some differences), motivating further analyses ( Figures 5 and 6). Predictive-feature matrices for all behavioral measures can be found in Figures Figure 4) across behavioral measures. The predictive-feature matrices were concatenated across brain states and correlated across behavioral measures. If a pair of behavioral measures exhibited a high value (green), then this indicates that the two behavioral measures are predicted by highly similar network features. (B) Correlations of predictive-feature matrices across brain states. Predictive-feature matrices were averaged within each behavioral domain and correlated across brain states. The behavioral measures were ordered and categorized based on ABCD's classification of these measures into cognition, personality and mental health behavioral domains, so we referred to this ordering as "hypothesis-driven". Figure S10 shows the analogue of this figure, but without collapsing across either dimension of brain state or behavior. MID: monetary incentive delay; SST: stop signal task. Figure 6. Predictive network features are similar within data-driven behavioral domains and across brain states. Both panels (A) and (B) are the same as Figure 5, except that behavioral measures are ordered and categorized based on the data-driven clusters of cognition, personality and mental health. These data-driven clusters were obtained by hierarchical clustering of the predictive-feature matrices ( Figure 4) as indicated by the dendrogram in panel A. Clustering was performed using hierarchical agglomerative average linkage ( UPGMA) clustering as implemented in scipy 1.2.1 (Virtanen et al. 2020) . Figure S11 shows the analogue of this figure, but without collapsing across either dimension of brain state or behavior. MID: monetary incentive delay; SST: stop signal task.1 Distinct brain network features support the prediction of cognition, personality, and mental health Having established that predictive network features were similar within behavioral domains and across brain states, we investigated the topography of predictive network features that were shared across states within each behavioral domain. Predictive-feature matrices were averaged within each hypothesis-driven behavioral domain, yielding 12 predictive-feature matrices (one for each behavioral domain and each brain state; Figure S7). To limit the number of multiple comparisons, permutation tests were performed for each within-network and between-network block by averaging predictive-feature values within and between 18 networks (FDR q < 0.05; Figure S12).
To examine predictive features common across brain states, we averaged the predictive-feature matrices across all brain states, considering only network blocks that were significant and exhibited the same directionality across states ( Figure 7A). This conjunction thus highlights predictive network features that are shared across brain states and across behavioral measures within a behavioral domain. Figure 7B illustrates the connectivity strength obtained from averaging within each significant block. Figures 7C and 7D illustrate the predictability of each cortical region obtained by summing the rows of Figure 7A for positive and negative predictive-feature values separately (see subcortical regions in Figure  S13A). As can be seen ( . For example, lower connectivity of somatomotor network B with subcortical and default network A regions was predictive of higher cognitive scores (i.e., better cognition). As another example, greater connectivity between salience/ventral attention network A and default network C, as well as lower connectivity between salience/ventral attention network A and control networks were predictive of better cognition.  (Figures 7C & 7D). For example, greater connectivity between default networks A/B and dorsal attention networks A/B were predictive of larger mental health scores (i.e., worse mental health). On the other hand, lower connectivity within default networks A/B were predictive of worse mental health.
As a control analysis, we utilized the previously derived data-driven clusters of cognition, personality and mental health ( Figure 6) to perform the same analyses, yielding highly similar results (Figures S14, S15 & S13B). Average correlations between the hypothesis-driven and data-driven predictive-feature matrices were r = 0.99 (cognition), 0.84 (personality) and 0.92 (mental health).

Control analyses
We performed several additional control analyses to ensure robustness of our results. First, we regressed age and sex (in addition to FD/DVARS) from the behavioral variables before prediction, which only decreased the prediction performance slightly ( Figure S16).
Second, instead of multi-kernel FC prediction, we averaged functional connectivity across all brain states (Elliott et al. 2019) and utilized the resulting mean-FC for kernel regression. We found that mean-FC yielded worse prediction performance for cognition compared with multi-kernel regression (Figure S17, but not personality and mental health. This suggests that the improvement in predicting cognitive traits using multi-kernel FC was not simply due to more available data per individual. Third, to ensure our results were robust to the regression model, we also performed linear ridge regression. We obtained similar prediction performance, but linear regression achieved worse COD ( Figure S18). Remarkably, the feature-predictive matrices were highly similar for both linear regression and kernel regression (average r = 0.99), suggesting the predictive-feature matrices are robust to the choice of regression algorithm. We note that if we interpreted the regression weights directly without model inversion, then the agreement between kernel regression and linear regression "only" achieved an average correlation of r = 0.66. This observation confirms the importance of inverting the regression models (Haufe et al. 2014) .
Fourth, we computed the predictive-feature matrices based on the single-kernel regression models and found that the results were highly similar to the predictive-feature matrices of the multi-kernel regression model (average r = 0.95).

Figure 7.
Brain network features that support individual-level prediction of cognition, personality and mental health. (A) Predictive-feature matrices averaged across brain states, considering only within-network and between-network blocks that were significant across all four brain states (Rest, MID, SST, N-Back). (B) Predictive network connections obtained by averaging the matrices in panel (A) within each between-network and within-network block. (C) Positive predictive features obtained by summing positive predictive-feature values across the rows of panel (A). A higher value for a brain region indicates that stronger connectivity yielded a higher prediction for the behavioral measure. (D) Negative predictive features obtained by summing negative predictive-feature values across the rows of panel (A). A higher value for a brain region indicates that weaker connectivity yielded a greater prediction for the behavioral measure. See Figure S13A for the subcortical maps. For visualization, the values within each matrix in panel A were divided by their standard deviations. The current figure utilized hypothesis-driven behavioral domains. Conclusions were highly similar using data-driven behavioral clusters ( Figure S15).

Discussion
In a large sample of typically developing children, we found that compared to resting-FC, task-FC of certain tasks improves prediction of cognition, but not (impulsivity-related) personality or mental health. Integrating resting-FC and task-FC further improves prediction of cognition and personality, but not mental health. By considering a large number of measures across cognition, personality and mental health, we found that these behavioral domains were predicted by distinct patterns of brain network features. However, within a behavioral domain (e.g., cognition) and across brain states, the predictive network features were similar, suggesting the potential existence of shared neural mechanisms explaining individual variation within each behavioral domain.

Predictive brain network features cluster together within behavioral domains
Previous task-FC behavioral prediction studies have typically focused on specific cognitive traits, such as fluid intelligence  , attention (Rosenberg et al. 2016) or reading comprehension (Gao et al. 2019) . By exploring a wide range of behavioral measures, we gained insights into shared and unique predictive network features across traits within the same domain and across domains, as well as across brain states (rest and task). While there were differences among predictive network features within a behavioral domain ( Figures S3-S6), the strong similarity was striking (Figures 5-6). This was especially the case for the cognitive domain ( Decades of studies, ranging from lesion to functional neuroimaging studies, have suggested the existence of brain networks that are specialized for specific cognitive functions (Petersen et al. 1988, Freiwald and Tsao 2010, Nomura et al. 2010, Laird et al. 2011, Yeo et al. 2015 . For example, language tasks activate a specific network of brain regions (Binder et al. 1997, Fedorenko et al. 2012, Braga et al. 2019 . Another example is the specific loss of episodic memory but not language after medial temporal lobe lesions (Scoville andMilner 1957, Corkin 2002) . Of course, the networks that preferentially underpin aspects of behavior do not work in isolation and many studies have also emphasized information integration across specialized brain networks (van den Heuvel and Sporns 2011, Bzdok et al. 2016, Cohen and D'Esposito 2016, Bertolero et al. 2018 . Lesion studies have also suggested that damage to connector hubs lead to deficits in multiple functional domains (Warren et al. 2014) . Thus, while we did not expect predictive network features to be completely different across cognitive measures, we did not anticipate such strong similarity.
Similarly, in the case of mental health measures, while diagnostically distinct psychiatric disorders are likely the result of differentially disrupted brain systems, there is significant comorbidity among disorders and overlap in clinical symptoms (Kessler et al. 2011, Tamminga et al. 2013, Russo et al. 2014 . Certain brain circuits have also been disproportionately reported to be transdiagnostically aberrant across multiple psychiatric and neurological disorders (Menon 2011, Whitfield-Gabrieli and Ford 2012, Goodkind et al. 2015, Baker et al. 2019, Kebets et al. 2019 . For instance, there is evidence for the core role of frontoparietal network disruptions across psychiatric diagnosis (Cole, Repovš, et al. 2014) . Therefore, similarly to cognition, we did not expect predictive network features to be completely different across mental health measures, but the degree of similarity was still surprising. These findings underscore the importance of studying multiple facets of psychopathology at once in order to better characterize covariation among symptoms to redefine psychiatric nosologies (Kozak andCuthbert 2016, Kotov et al. 2017) .
One possibility is that even though the regression models were trained on specific behavioral measures, the learned models might be predicting a broad behavior rather than the specific behaviors they were trained on. For example, in the case of cognition, perhaps the network features were simply predicting the g factor, a general cognitive ability that can account for half of the variance of cognitive test scores (Carroll 2003) . In the case of mental health, the network features might be predicting the p factor, a general psychopathology factor that reflects individuals' susceptibility to develop psychopathologies (Caspi et al. 2014) . The similarity in predictive network features across the personality measures was less surprising since the personality measures we considered were mostly impulsivity-related. Thus, the regression models might simply be predicting an overall impulsivity trait (Leshem and Glicksohn 2007) .

Distinct brain network features support the prediction of cognition, personality, and mental health
We found that cognitive performance was predicted by a distributed set of network features across the whole brain with connectivity of salience and somatomotor networks being particularly notable (Figures 6C & 6D). The involvement of the salience network might not be surprising given its involvement in saliency, switching, attention and control (Menon and Uddin 2010) . The prominent role of the somatomotor network was more surprising, although somatomotor regions have been reported to be associated with fluid intelligence  , attention (Rosenberg et al. 2016) , and general cognitive dysfunction (Kebets et al. 2019) .
Similarly to cognitive performance, (impulsivity-related) personality measures were predicted by a distributed set of network features across the whole brain. In the case of personality, connectivity involving default and dorsal attentional networks was particularly prominent. While classical models of impulsivity have typically highlighted dysregulation in fronto-striatal circuits, these have been predominantly informed by animal lesion, PET and case-control task activation studies (Jentsch and Taylor 1999, Dalley et al. 2008, Beck et al. 2009, Buckholtz et al. 2010, Fineberg et al. 2010, Balodis et al. 2012, Cubillo et al. 2012 . Conversely, fMRI studies of healthy participants have reported correlations between trait impulsivity and resting-FC in default (Inuggi et al. 2014, Golchert et al. 2017) and attentional (Golchert et al. 2017) networks. FC measured during the SST in attentional regions has also been found to be correlated with impulsivity in adults (Farr et al. 2012) . Our whole-brain connectome approach not only supports the roles of default and attentional networks in impulsivity from these seed-based studies, but also extends these findings to children.
Finally, mental health measures were predicted by a distributed set of network features with the connectivity of default and frontoparietal control networks being particularly prominent. Connectivity involving default and frontoparietal regions have been linked to multiple psychiatric disorders and associated symptom profiles (Whitfield-Gabrieli and Ford 2012, Baker et al. 2014, Xia et al. 2018, Sha et al. 2019 . We extend these findings by showing that the connectivity of these networks were important for predicting mental health in typically developing children prior to the onset of psychiatric illness and at a point where association cortices are still maturing.

Resting and task network organization
A surprising result is that the predictive network features were similar across brain states (rest, MID, SST, N-Back) for all behavioral domains, particularly in the case of personality and mental health. On the one hand, task network reorganization has been shown to influence cognitive performance (Schultz andCole 2016, Zuo et al. 2018) . On the other hand, our results are consistent with studies showing that task states only modestly influence functional connectivity (Cole, Bassett, et al. 2014, Bzdok et al. 2015 with inter-individual differences dominating task modulation (Gratton et al. 2018) .
We note that a previous study (Gao et al. 2019) suggested that the regression models utilized different network features for prediction across different brain states, while another study  suggested that there was substantial overlap in predictive network features across resting-FC and task-FC. These discrepancies might arise because the previous studies only interpreted the most salient edges selected for prediction, which might yield unstable results. Here, we followed the elegant approach of Haufe and colleagues (2014) to invert the prediction models, leading to highly consistent predictive network features across two regression models (kernel regression and linear regression). A lack of inversion leads to weaker agreement between the two models.
Consistent with previous studies , Yoo et al. 2018, Fong et al. 2019) , we found that task-FC outperforms resting-FC for the prediction of cognitive performance, at least in the case of N-back and MID. Although resting-FC was better than SST-FC for predicting cognition (Figure 2), we note that there was more resting-fMRI data than SST-fMRI data, which might explain the gap in performance. Here, we did not control for fMRI duration because our goal was to maximize prediction performance and to quantitatively characterize the predictive network features (Bzdok and Ioannidis 2019) . Similarly, the prediction improvement from integrating information across brain states (multi-kernel regression) partly comes from the use of more fMRI data per child, but at least in the case of cognition, the improvement was not entirely due to more data ( Figure S17). Consistent with previous studies (Elliott et al. 2019, Gao et al. 2019, Jiang et al. 2019 , we found that combining rest-FC and task-FC improved prediction of cognition. Extending upon this work, we demonstrate that combining rest-FC and task-FC modestly improved prediction of personality, but not mental health. We also found that regardless of using resting-FC, task-FC, or both resting-FC and task-FC, greater performance was achieved for predicting cognition than personality or mental health. This is again consistent with previous studies relating resting-fMRI with inter-individual variation in multiple behavioral domains (Dubois et al. 2018, Maglanoc et al. 2019) .

Strengths and limitations
One strength of our study was the use of a whole-brain connectomics approach to predict a wide range of behavioral traits. Many neurodevelopmental studies have focused on specific brain circuits (Bjork et al. 2004, Galvan et al. 2006, Van Leijenhorst et al. 2010, Satterthwaite et al. 2012, Gee et al. 2013, Swartz et al. 2014, Jalbrzikowski et al. 2017, Silvers et al. 2017) . Yet, the human brain comprises functional modules that interact as a unified whole to support behavior (Spreng et al. 2010, Bertolero et al. 2015, Bassett and Sporns 2017. Therefore, whole-brain network-level approaches could provide critical insights into neurodevelopment that might be missed by studies focusing on specific networks. Our results were also robust across brain states, simple and more advanced predictive algorithms and recruitment sites. However, since the ABCD cohort comprises typically developing children, it is unclear how our results, especially those pertaining to mental health, might generalize to groups with clinical diagnoses. Furthermore, the cross-sectional nature of our study and the limited age range of the participants prevented us from thoroughly examining neurodevelopmental changes across time or age. Whole-brain neurodevelopmental studies have shown that functional networks become more distributed throughout adolescence (Fair et al. 2009, Supekar et al. 2009) . As such, it remains to be seen how the predictive network features from our study might be similarly affected by the developmental process. Lastly, we did not include any non-imaging features, which could have enriched our predictive models (Eickhoff and Langner 2019) .

Conclusions
Our study demonstrated that combining task-FC and resting-FC can yield improved predictions of cognition and personality, but not mental health. Each behavioral domain was predicted by unique patterns of brain network features that were distinct from other behavioral domains. These features were robust across brain states and regression approaches. Overall, our findings revealed distinct brain network features that account for individual variation across broad domains of behavior, yet are shared for behaviors within the same domain.

Participants
We considered data from 11875 children from the ABCD 2.0.1 release. After strict preprocessing quality control (QC) and considering only participants with complete rest-fMRI, task-fMRI and behavioral data, our main analyses utilized 1858 unrelated children ( Figure  1A). See further details below.

Imaging acquisition & processing
Images were acquired across 21 sites in the United States with harmonized imaging protocols for GE, Philips, and Siemens scanners. We used structural T1, resting-fMRI, and task-fMRI from three tasks: monetary incentive delay (MID), N-Back, stop signal task (SST). See Supplemental Methods S1 for details.
Minimally preprocessed fMRI data (Hagler et al. 2019) were further processed with the following steps: (1) removal of the first four frames, (2) slice time correction with the FSL library (Jenkinson et al. 2002, Smith et al. 2004 , (3) motion correction using rigid body translation and rotation with FSL, and (4) alignment with the T1 images using boundary-based registration (Greve and Fischl 2009) with FsFast ( http://surfer.nmr.mgh.harvard.edu/fswiki/FsFast ). Functional runs with boundary-based registration costs greater than 0.6 were excluded. Framewise displacement (FD) (Jenkinson et al. 2002) and voxel-wise differentiated signal variance (DVARS) (Power et al. 2012) were computed using fsl_motion_outliers. Volumes with FD > 0.3 mm or DVARS > 50, along with one volume before and two volumes after, were marked as outliers and subsequently censored. Uncensored segments of data containing fewer than five contiguous volumes were also censored (Gordon et al. 2016) . fMRI runs with over half of their volumes censored were removed. We also excluded individuals who did not have at least 4 minutes for each fMRI state (rest, MID, N-Back, SST) from further analysis.
The following nuisance covariates were regressed out of the fMRI time series: global signal, six motion correction parameters, averaged ventricular signal, averaged white matter signal, and their temporal derivatives (18 regressors in total). Regression coefficients were estimated from the non-censored volumes. We chose to regress the global signal because we were interested in behavioral prediction and global signal regression has been shown to improve behavioral prediction performance  . The brain scans were interpolated across censored frames using least squares spectral estimation  , band-pass filtered (0.009 Hz ≤ f ≤ 0.08 Hz), and projected onto FreeSurfer fsaverage6 surface space and smoothed using a 6 mm full-width half maximum kernel.

Functional connectivity
We used a whole-brain parcellation comprising 400 cortical regions of interest (ROIs) (Schaefer et al. 2018) ( Figure 1B) and 19 subcortical ROIs (Fischl et al. 2002) ( Figure 1C). For each participant and each fMRI run, functional connectivity (FC) was computed as Pearson's correlations between the average time series of each pair of ROIs. FC matrices were averaged across runs from each state, yielding a 419 x 419 FC matrix for each fMRI state (rest, MID, N-back, SST). Censored frames were ignored when computing FC.

Behavioral data
We analyzed data from all available dimensional neurocognitive (Luciana et al. 2018) and mental health ) assessments, yielding 16 cognitive, 9 mental health and 11 impulsivity-related personality measures. See Supplemental Methods S2 for more details. Participants who do not have all behavioral measures were excluded from further analysis.

Single fMRI-state prediction
We used kernel ridge regression to predict each behavioral measure based on resting-FC, MID-FC, N-back-FC and SST-FC separately. We chose kernel regression because of its strong prediction performance in resting-FC based behavioral prediction (He et al. 2020) . Briefly, let and be the behavioral measure and FC of training individual . Let and be the behavioral measure and FC of a test individual. Then, kernel regression would predict the test individual's behavior as the weighted average of the training individuals' behavior, i.e.
, where was defined as the Pearson's correlation between and . Thus, kernel regression assumed that individuals with more similar FC exhibit more similar behavior. To reduce overfitting, an l 2 -regularization term was included , He et al. 2020 . Details of this approach can be found elsewhere , He et al. 2020 .
Kernel regression was performed within an inner-loop (nested) cross-validation procedure. More specifically, there were 22 ABCD sites. As recommended by the ABCD consortium, individuals from Philips scanners were excluded due to incorrect pre-processing. Our final sample for the main analysis comprised 1858 children. To reduce sample size variability across sites, we combined sites together to create 10 "site-clusters", each containing at least 150 individuals (Table S4). Thus, participants within a site are in the same site-cluster.
We performed leave-3-site-clusters-out nested cross-validation for each behavioral measure with 120 replications. For each fold, a different set of 3 site-clusters was chosen as the test set. Kernel ridge regression parameters were estimated from the remaining 7 site-clusters using cross-validation. For model selection, the regularization parameter was estimated within the "inner-loop" of the inner-loop (nested) cross-validation procedure. For model evaluation, the trained kernel regression model was applied to all unseen participants from the test site-clusters.
Head motion (mean FD and DVARS) were regressed from each behavioral measure before the cross-validation procedure. More specifically, regression coefficients were estimated from the 7 training site-clusters and applied to the 3 test site-clusters. This regression procedure was repeated for each split of the data into 7 training site-clusters and 3 test site-clusters.
Prediction performance was measured by correlating predicted and actual measures (Finn et al. 2015) . We also computed coefficients of determinations, which yielded similar conclusions.

Multi-state prediction
To explore whether combining resting-FC and task-FC would result in better prediction accuracy, we utilized FC matrices from all four brain states (Rest, MID, SST, N-back) for prediction using a multi-kernel framework (Supplemental Methods S3). Similarly to single-kernel regression, multi-kernel regression assumed that subjects with similar FC exhibit similar behavioral scores. However, instead of taking into account FC from one fMRI state, here we utilized FC from all four fMRI states.

Statistical tests of prediction accuracy
To test whether a model achieved better-than-chance accuracy, we performed permutation tests by shuffling behavioral measures across participants and repeating the entire leave-3-site-clusters-out nested cross-validation procedure. To compare two models, a permutation test was not valid, so the corrected resampled t-test was utilized (Nadeau andBengio 2003, Bouckaert andFrank 2004) . The resampled t-test corrected for the fact that accuracies of test folds were not independent. We corrected for multiple comparisons using FDR (q < 0.05).

Model interpretation
As can be seen, multi-kernel FC yielded the best prediction performance. Models estimated for prediction can be challenging to interpret (Bzdok and Ioannidis 2019) . Here, we utilized the approach from Haufe and colleagues (2014) , yielding a 419 x 419 predictive-feature matrix for each FC state and each behavioral measure (Supplemental S4). A positive (or negative) predictive-feature value indicates that higher FC was associated with predicting greater (or lower) behavioral values.
The predictive-feature matrices were more similar among behavioral measures within the same behavioral domain (cognition, mental health and personality) than across domains. Thus, we averaged the predictive-feature matrices within the same behavioral domain (cognitive, mental health and personality) considering only behavioral measures that were successfully predicted by multi-kernel FC regression. This yielded a 419 x 419 predictive-feature matrix for each fMRI state and each behavioral domain.
Statistical significance of the predictive-feature values was tested using a permutation test (2000 permutations). To limit the number of multiple comparisons, tests were performed for each within-network and between-network block by averaging predictive-feature values within and between 18 networks (Figures 6B & 6C). We corrected for multiple comparisons using FDR (q < 0.05).

Control analyses
Because the multi-kernel model contained more input data compared to the single-kernel models, we explored the potential effect of the amount of input data on model performance. To this end, we performed a single-kernel ridge regression on a general functional connectivity matrix created by averaging the functional connectivity across all fMRI conditions (rest + MID + N-Back + SST) to predict behaviors, which we called Mean FC. We then compared the performance of the Mean FC model with the best single-kernel fMRI model (e.g. N-Back only) and the multi-kernel model. To assess the impact of age and sex on model performance, we performed kernel ridge regression to predict behaviors after regressing out age and sex, in addition to head motion (mean FD and DVARS).

Code availability
Preprocessing utilized code from previously published pipelines  : https://github.com/ThomasYeoLab/CBIG/tree/master/stable_projects/preprocessing/CBIG_f MRI_Preproc2016 Preprocessing code specific to this study can be found here: GITHUB_LINK. Analysis code specific to this study can be found here: GITHUB_LINK.

S1. MRI acquisition
For each participant, twenty minutes of resting-state fMRI data were acquired in four 5-minute runs. The task fMRI data consisted of three tasks (MID, N-back, SST) that were each acquired over two runs (for a total of six task fMRI runs). Each fMRI run was acquired in 2.4 mm isotropic resolution with a TR of 800 ms. The structural data consisted of one 1 mm isotropic scan for each participant. Full details of image acquisition can be found elsewhere (Casey et al. 2018) .

S2. Behavioral data
We analyzed data from all available dimensional neurocognitive (Luciana et al. 2018) and mental health  assessments. For the neurocognitive assessments, we included the NIH Toolbox, Rey Auditory Verbal Learning Test, Little Man Task, and the matrix reasoning subscale from the Wechsler Intelligence Scale for Children-V, in order to measure different aspects of cognition. For the mental health assessments, we included the Achenbach Child Behavior Check List (CBCL), the mania scale from the Parent General Behavior Inventory, Pediatric Psychosis Questionnaire. For the personality measures, we included the Modified UPPS-P for Children and Behavioral Inhibition and Activation scales. See Tables S1 & S2 for more details for each individual scale.

S3.1. Single-kernel ridge regression
For completeness, we provide a brief explanation of single-kernel ridge regression. The following section is adapted from our previous study . Suppose we have training subjects. Let be the behavioral measure (e.g., fluid intelligence) and be the vectorized FC (considering only lower triangular matrix) of the -th training subject. Given and , the kernel regression model is written as: where is the bias term and is the functional connectivity similarity between the -th and -th training subjects.
is defined by the correlation between the vectorized FC of the two subjects. The choice of correlation is motivated by previous fingerprinting and behavioral prediction studies (Finn et al. 2015, He et al. 2020 . To estimate and from the training set, let , and be the kernel similarity matrix, whose -th element is . Note that we can rewrite Eq. (1) as . We can then estimate and by minimizing the following l 2 -regularized cost function: where controls the importance of the l 2 -regularization and is estimated within the inner-loop cross-validation procedure. We emphasize that the test set was not used to estimate . Once and have been estimated from the training set, the predicted behavior of test subject is given by

S3.2. Multi-kernel ridge regression
Single-kernel ridge regression uses data from a single fMRI brain state for prediction. To extend to multiple fMRI brain states, we can construct one kernel similarity matrix for each fMRI brain state. Suppose we have training subjects and fMRI brain states. Let be the behavioral measure of the -th training subject. Let be the vectorized FC of the -th training subject for the -th fMRI brain state. The multi-kernel regression model can be written as: where is the bias term and is the functional connectivity similarity between the -th and -th training subjects for the -th brain state. Like before, is defined by the correlation between the vectorized FC of the two subjects for the -th brain state.

Let
and . Suppose is the kernel similarity matrix for the -th brain state, whose -th element is . We can estimate and by minimizing the following l 2 -regularized cost function: where controls the importance of the l 2 -regularization for the -th kernel. Here, is estimated within the inner-loop cross-validation procedure using Gaussian-process optimization ( Kawaguchi et al., 2015 ). We emphasize that the test set was not used to estimate . Once and have been estimated from the training set, the predicted behavior of test subject is given by

S3.3. Coefficient of determination (COD)
Suppose is the number of test subjects, and are the groundtruth and predicted behavior measure of the -th test subject respectively, and is the mean of the behavioral measure of all training subjects. The coefficient of determination is defined as follows: Thus, a larger COD indicates more accurate prediction. A negative value implies that we are better off using the mean behavior of the training subjects to predict the behavior of the test subject instead of using the FC data.

S4. Predictive-feature matrices
To interpret which brain edges were important for the multi-kernel FC model, we utilized an elegant approach (Haufe et al. 2014) to invert the prediction model. Failure to invert the model leads to uninterpretable results (Haufe et al. 2014) . Let us consider the functional connectivity between brain regions and . We would like to compute the predictive-feature value of the functional connection for the multi-kernel FC model. A positive value (or negative) predictive-feature value for an edge, indicating that higher FC between brain regions and was associated with predicting greater (or lower) behavioral values.
Let be the normalized functional connectivity strength between brain regions and for all training subjects. Therefore, is an vector where is the number of training subjects. Normalization was performed so that the FC of each subject has zero mean and unit norm. Let be the prediction of the training subjects' behavioral measure based on the estimated kernel regression model. Therefore is an vector where is the number of training subjects. According to Haufe and colleagues (2014) , . However, because we would like to compare across different behavioral measures, the scale of is very different across behavioral measures. Thus, we computed , which does not change the relative predictive-feature values among edges, but allows for comparisons between behavioral measures. We note that the above formula is applied to the training set, because we want to interpret the trained model. However, recall that we performed leave-3-site-clusters-out nested cross-validation for each behavioral measure with 120 replications. Thus we computed the predictive-feature values for each replication and averaged across the 120 replications.  Figure S1. Cross-validated prediction performance (coefficient of determination; COD) using kernel ridge regression for resting-state and task-states (MID, SST, N-Back). Multi-kernel FC utilized FC from all 4 brain states for prediction. Higher COD indicates greater variance predicted relative to the mean of the training data.        Figure S10. Similarity of predictive-network features for each significantly predicted behavior and brain state. The behavioral measures were ordered based on hypothesis-driven behavioral domains (cognition, personality and mental health). For each behavior, the brain states were ordered by Rest, MID, SST and finally N-Back. Red font indicates cognitive measures. Black/grey font indicates personality measures. Blue font indicates mental health measures. Predictive-network features were highly correlated within each hypothesis-driven behavioral domain and across brain states. Figure S11. Similarity of predictive-network features for each significantly predicted behavior and brain state. The behavioral measures were ordered based on data-driven behavioral clusters (cognition, personality and mental health). For each behavior, the brain states were ordered by Rest, MID, SST and finally N-Back. Red font indicates cognitive measures. Black/grey font indicates personality measures. Blue font indicates mental health measures. Predictive-network features were highly correlated within each hypothesis-driven behavioral domain and across brain states.   Figures 6C and 6D for the cortical maps of the hypothesis-driven behavioral domains and Figures S15C and S15D for the cortical maps of the data-driven behavioral clusters. Figure S14. Predictive-feature matrices showing significant network blocks for each data-driven behavioral cluster (cognitive, personality, mental health) and for each brain state (Rest, MID, SST, N-Back) after permutation testing. For visualization, the values within each matrix were divided by their standard deviations. Figure S15. Predictive brain network features for predicting cognition, personality and mental health. This figure is the same as Figure 6 but using data-driven behavioral clusters, instead of hypothesis-driven behavioral domains. (A) Predictive-feature matrices averaged across brain states, considering only within-network and between-network blocks that were significant across all four brain states (Rest, MID, SST, N-Back). (B) Predictive network connections obtained by averaging the matrices in panel (A) within each between-network and within-network block. (C) Positive predictive features obtained by summing positive predictive-feature values across the rows of panel (A). A higher value for a brain region indicates that stronger connectivity yielded a higher prediction for the behavioral measure.

Supplementary results
(D) Negative predictive features obtained by summing negative predictive-feature values across the rows of panel (A). A higher value for a brain region indicates that weaker connectivity yielded a greater prediction for the behavioral measure. Conclusions were highly similar using hypothesis-driven behavioral domains (Figure 7).