Graph Learning for Cortical Parcellation from Tensor Decompositions of Resting-State fMRI

Cortical parcellation has long been a cornerstone in the field of neuroscience, enabling the cerebral cortex to be partitioned into distinct, non-overlapping regions that facilitate the interpretation and comparison of complex neuroscientific data. In recent years, these parcellations have frequently been based on the use of resting-state fMRI (rsfMRI) data. In parallel, methods such as independent components analysis have long been used to identify large-scale functional networks with significant spatial overlap between networks. Despite the fact that both forms of decomposition make use of the same spontaneous brain activity measured with rsfMRI, a gap persists in establishing a clear relationship between disjoint cortical parcellations and brain-wide networks. To address this, we introduce a novel parcellation framework that integrates NASCAR, a three-dimensional tensor decomposition method that identifies a series of functional brain networks, with state-of-the-art graph representation learning to produce cortical parcellations that represent near-homogeneous functional regions that are consistent with these brain networks. Further, through the use of the tensor decomposition, we avoid the limitations of traditional approaches that assume statistical independence or orthogonality in defining the underlying networks. Our findings demonstrate that these parcellations are comparable or superior to established atlases in terms of homogeneity of the functional connectivity across parcels, task contrast alignment, and architectonic map alignment. Our methodological pipeline is highly automated, allowing for rapid adaptation to new datasets and the generation of custom parcellations in just minutes, a significant advancement over methods that require extensive manual input. We describe this integrated approach, which we refer to as Untamed, as a tool for use in the fields of cognitive and clinical neuroscientific research. Parcellations created from the Human Connectome Project dataset using Untamed, along with the code to generate atlases with custom parcel numbers, are publicly available at https://untamed-atlas.github.io.

. Therefore, it is difficult to gauge their potential advantage with respect to neuroscientific studies over other more widely used approaches.Here, we use the state-of-theart network embedding method "Network embedding as Matrix Factorization" (NetMF) (Qiu et al., 2018).We also compared our parcellation series to an extensive set of widely-used parcellations in terms of commonly-used metrics such as Resting State Function Connectivity (RSFC) homogeneity, task contrast alignment, and architectonic map alignment.Our findings indicate that our parcellations either outperform or are on par with some of the most widely used atlases in neuroscience, including the Schaefer atlases (Schaefer et al., 2018).
We also discuss the rather surprising observation that, despite the marked differences in the boundary locations of our parcellations and those of the Schafer atlases, for example, they exhibit remarkably similar performance in terms of the metrics listed above.
Our methodological pipeline consists of temporal synchronization or alignment using BrainSync (Joshi et al., 2018;Akrami et al., 2019), brain network tensor decomposition using NASCAR (Li et al., 2023), and graph representation learning.The pipeline is largely automated, in contrast with several alternative parcellation methods (e.g., Glasser et al., 2016;Joshi et al., 2022) that require expert manual input.Users can apply our framework to new datasets with any desired number of parcels.
Once brain networks are computed using NASCAR, it takes only a few minutes on a modern workstation to generate an atlas with the desired number of parcels.Since we leverage the unconstrained spatial activation maps from NASCAR with temporal alignment plus node embedding, we named our cortical parcellations Untamed.

2
Material and methods

Overview
We derived cortical parcellations using rsfMRI data from 500 datasets.In this way, we avoid potential bias in the evaluation of Untamed when comparing to other parcellations that may arise from using data collected using the HCP protocol for both construction and evaluation of parcellations.

The Human Connectome Project (HCP) dataset
We utilized the minimally preprocessed 3T  filtered adjacency matrices of the graphs   ,   , for the left and the right hemispheres, respectively.

Graph node embedding and clustering
We adopted the method outlined in NetMF (Qiu et al where  is the degree matrix of the graph; log(•) and max{•, 1} are element-wise logarithmic and maximum operations, respectively;  denotes the number of negative samples; and  is the window length over which we consider nodes as co-occurrent positive pairs.We found that in practice the method is not sensitive to the choice of these two hyperparameters, which had little impact on the averaged training subjects' RSFC homogeneity (see evaluation metrics section).We set  We also evaluated the level of agreement between parcels obtained 82 from each parcellation method and selected architectonic areas from the 83 Brodmann atlas provided by HCP (https://balsa.wustl.edu/Wz8r).These  We used random parcellations as a null model to help us identify whether there is an optimal parcel number.We used the task variance, as described in section 2.6.2, as the performance metric.Specifically, we computed the ratio of the task variance obtained with Untamed to that from random parcellations.Random parcellations were generated through a region-growing process using the MNE-Python package (Gramfort, 2013).We computed this ratio and plotted it as a function of number of parcels (from 2 to 500 parcels per hemisphere).For each parcel number setting, 50 parcellation realizations were generated with different initialization seeds for both random and Untamed.The median of task variance across contrasts and trials was then computed for the HCP dataset and the MDTB dataset separately.The ratio plot was smoothed with a moving average with a window length of 5 for visualization.Since these results may be biased as HCP data was also used to generate 28 the Untamed atlases, we also include results for the Yale dataset.In this 29 case, Untamed achieved higher homogeneities than 9 out of the 17 30 atlases, but was lower than the others, notably including all the Schafer 31 atlases.However, in these cases, the effect sizes are relatively small (≤

32
.06).Because there are ~30% unlabeled vertices in Gordon, we excluded 33 these vertices from Untamed to match the number of vertices when 34 evaluating homogeneity.In this case, there is no significant difference.

35
However, if we assign the missing vertices to the nearest parcel as 36 described above (Gordon2) homogeneity is significantly higher for 37 Untamed. 38

Alignment with task contrasts 39
The violin plot in Fig. 4 shows the distribution of the differences in   cases with large effect sizes.In 2 cases (100, 130), there is no statistically significant difference and in the remaining 2 cases (114, 200) GLC is statistically significantly better ( < .05),but the effect size is small.These results support the use of NetMF embedding in place of the more standard GLC approach.

Ablation study of graph construction methods
We also explored the effect of graph construction as described in Section 2.9 by comparing results using the NASCAR-based adjacency matrix with that computed directly using Pearson correlation between rsfMRI time series.All other aspects of processing were identical.Pearson-based adjacency in all cases ( < .00001).This demonstrates an advantage of using the results of NASCAR tensor decomposition to identify spatial networks over that using the Pearson correlation between rsfMRI time series.

3.6
Is there an optimal number of parcels?
To explore this question, we compared how well the Untamed atlas identified regions of relatively homogenous functional activity, as reflected in the task maps, compared to random parcellations with an equal number of regions.Fig. 8 shows the ratio of variance for the HCP and MDTB task datasets between Untamed and random parcellations.
The curve obtained with HCP data is different from that with MDTB, possibly because of the different battery of fMRI tasks and contrasts employed.There is a distinct shallow minimum in the ratio plot in the MDTB data at ~61 parcels per hemisphere.The curve associated with the HCP dataset has a less clear trend where the optimal number of parcels lies between 100 and 150.

Discussion
In this paper we introduce "Untamed", a novel cortical parcellation scheme developed from population resting-state fMRI data.This scheme The per-parcel homogeneity scores between the two are much more 5 consistent than Untamed-100 and Random-100 (random parcellation 6 from region growing as described in section 2.10) as depicted in Fig. S6. 7 However, it appears that there is still room for further improvement with   The existence of a large number of brain atlases raises the question 32 of which to select for a particular purpose.In general, a cortical

Introduction
Cortical parcellation, which partitions the cerebral cortex into discrete, non-overlapping regions, is pivotal for the interpretation and comparison of neuroscientific findings.It simplifies the computational demand posed by high-dimensional neuroimaging data and is instrumental in elucidating cerebral organization.This endeavor has captivated the neuroscience community's interest for many years as evidenced by extensive research in this domain (Brodmann, 1909; Tzourio-Mazoyer et al., 2002; Yeo et al., 2011; Glasser et al., 2016; Schaefer et al., 2018; Eickhoff et al., 2018; Han et al., 2020; Joshi et al., 2022).Resting-state functional magnetic resonance imaging (rsfMRI) has become a cornerstone technique for non-invasively measuring spontaneous in vivo brain activity.By recording blood-oxygen-leveldependent (BOLD) signals without task stimuli, rsfMRI has proven instrumental in deciphering the human brain's intrinsic functional brain networks (Biswal et al., 1995; Beckmann et al., 2005; Allen et al., 2014). 2 emerging graph learning methods have not yet been fully exploited in brain parcellation methodology.A few studies that have applied graph representation learning to brain parcellation lack comprehensive comparisons with other parcellation schemes (e.g., Gopinath et al., 2019; He et al., 2020; from the Human Connectome Project (HCP) dataset (Van Essen et al., 2012; Glasser et al., 2013).The procedure includes a temporal alignment using BrainSync (Akrami et al., 2019; Joshi et al., 2018), a 3way tensor decomposition using NASCAR (Li et al., 2021, 2023), graph construction from spatial maps followed by graph node embedding using NetMF (Qiu et al., 2018), and finally -means clustering to obtain the parcellations.The overall pipeline is depicted in Fig. 1.The rsfMRI data from an independent set of 500 subjects from the HCP were used for evaluation.In addition to the HCP dataset, we also evaluated the parcellation performance on the Yale rsfMRI (Lee et al., 2022) and the multi-domain task battery (MDTB) (King et al., 2019; Zhi et al., 2022)

59 66 Fig. 1 .
Fig. 1.Pipeline for generating Untamed parcellations from HCP rsfMRI data.The input to the pipeline are the spatial activation maps from BrainSync and NASCAR applied to rsfMRI data of 500 subjects in HCP dataset as described in (Li et al., 2023).
= 1, the default value in the original NetMF paper, and  = 7 as the value that maximized the homogeneity of average RSFC over the training subjects.We computed  from  ̅  and  ̅  separately.We used singular value decomposition (SVD) to obtain the final embeddings (Qiu et al., 2018).Specifically, let  =       ⊺ , where   and   being the  largest singular values and their associated left singular vectors.The final embeddings were computed as  =  √   for the left (  ) and right hemisphere (  ) separately, with  being the embedding dimension.To obtain the final cortical parcellation, we applied  -means clustering to the feature vectors formed by the columns of   and   and varied the number of classes  from 21 to 200 to match the number of desired parcels in the parcellation.For each value of , we ran means clustering with 50 different random initializations and selected the result that yielded the highest homogeneity of the average RSFC of all training subjects.2.6 Evaluation metrics 2.6.1 Resting-state functional connectivity (RSFC) homogeneity Each subject's RSFC was computed as the Pearson correlation of rsfMRI time series between all pairs of cortical vertices.A Fisher ztransform was subsequently performed to obtain z scores from the correlation values.Similar to the evaluation procedure in (Joshi et al., 2022; Schaefer et al., 2018), a parcel-wise homogeneity score   was computed as the averaged RSFC values within the  th parcel.To obtain a global homogeneity measure for a single subject, a weighted average  of each parcel's homogeneity scores was then computed by accounting for different cluster sizes:  = ∑    =1 |  | || (3)where |  | is the number of vertices in parcel , and N is the total number of parcels; || is the total number of cortical vertices.For conciseness, we refer to this RSFC weighted average homogeneity as "homogeneity"

Table 2
contains summary statistics comparing RSFC homogeneity between Untamed and each of the other atlases with matched numbers of parcels for the HCP and Yale test sets.The supplementary Fig. S1 (for HCP) and Fig. 3 (for Yale) show the subject-wise differences in homogeneity as violin plots.In these results, a positive Cohen's  value indicates Untamed is more homogeneous than its counterpart and vice versa.When tested on the HCP data, Untamed achieves a clear improvement in homogeneity over all baseline atlases with statistical significance ( < 10 −3 ) with the exception of Schaefer-400 ( = .62).

40 41 Fig. 4 .Fig. 3 .
Fig. 4. Weighted average task contrast variance evaluated on MDTB test subjects.Each violin plot represents the difference between the weighted average variance for each group-average contrast map with Untamed and the matched baseline (Untamed baseline).A negative value indicates a lower average within-parcel variance in the Untamed atlas relative to the baseline atlas and vice versa.See text for discussion of statistical significance.See text for discussion of statistical significance.

Fig. 7 .
Fig. 7. Red area denotes premotor area (BA 6) and merged parcels from Untamed, Schaefer and Yan, all with 400 parcels.Dice coefficient scores are 0.820, 0.796, and 0.799 respectively between the merged parcels with the ROI.Derived with script provided by (Arslan et al., 2018).

Fig. S5 shows
Fig. S5 shows the difference in RSFC weighted average homogeneity between NASCAR-based and Pearson-based adjacency matrices.It is evident that the parcellations generated from the NASCAR-based adjacency matrix substantially outperform those generated from

1
constructs spatially disjoint parcels by leveraging the spatially overlapped brain networks identified by NASCAR tensor decomposition (Li et al., 2023, 2021).Fig. 9 shows a visual representation of both the 100 and 300 parcel versions of Untamed superimposed on the somatomotor sub-network, the default mode sub-network, and the ventral attention network of NASCAR.A key observation is that the parcel boundaries in Untamed align fairly closely with the transition between activated and deactivated regions in the NASCAR networks.Consequently, Untamed effectively bridges the gap between the spatially 42 overlapping brain networks identified by NASCAR and the pursuit of 43 spatially distinct brain parcellation.44We compared Untamed to an extensive list of popular atlases and exceed the performance of both by combining the best facets of each.Further support for this conjecture is shown in Fig 11 where we show the 2 RSFC homogeneity of Untamed-100 and Schaefer-100 as well as their 3 Dice coefficient for each of the Hugarian-algorithm matched parcels.4

8
respect to the Untamed and Schaefer results.

9
It is important to note that both the Schaefer and Untamed schemes 10 are based on rsfMRI; however, they were generated using distinct 11 acquisition protocols and independent subjects.The publicly released 12 Schaefer atlases, which were used in our experiments, were derived from 13 approximately 1500 subjects in the Genome Project dataset (GSP), 14 whereas our atlases were generated from an independent set of 500 15 subjects from the Human Connectome Project (HCP).Both datasets 16 consist of young adults (GSP: 18-35 yrs, HCP: 22-35 yrs).In addition to 17 the different subjects and data acquisitions, Schaefer utilized an entirely 18 different methodology (based on a Markov Random Field model) and 19 uses a distinct feature (RSFC).Exploring the application of the Untamed 20 methodology using the GSP data may offer further insights into the 21 similarity and differences between these two methodologies.22 As a final comparison we computed the RSFC between parcels for 23 the Untamed-200 and Schaefer-200 atlases.We used the Yale rsfMRI 24 dataset and performed a global signal regression before computing the 25 Pearson correlation between parcels on signals formed by averaging the 26 rsfMRI signal across each parcel, Fig. 12.We then used an automated 27 spectral clustering (Ng et al., 2001) method to identify 7 networks 28 consisting of groups of parcels that exhibit the most similarity in their 29 connectivity patterns.The ordered RSFC matrices and the corresponding 30 set of 7 networks (color coded) for each atlas are shown in Fig 12. 31

33 parcellation
should aim to segregate the cortex into distinct regions, with 34 the objective of maximizing the similarity of connectivity patterns, task 35 activation, and/or anatomical measures within each (Zhi et al., 2022).

Fig. 12 .
Fig. 12. RSFC based on (a) Untamed-200 and (b) Schaefer-200 (averaged across subjects) on Yale rsfMRI test subjects after global signal regression.Both clustered to 7 networks as in (Yeo et al., 2011) using spectral clustering.(c) and (d) shows all parcels of Untamed-200 and Shaefer-200 assigned network colors ., 2018) to embed vertices in a lower dimensional space with more representative features that can be used for clustering.This approach (Qiu et al., 2018) established an equivalence between the widely-used DeepWalk algorithm (Perozzi et al., 2014) and matrix factorization techniques.
hereafter.The homogeneity was computed for each test subject 53 separately.To quantitatively compare the homogeneity of Untamed to 54 that of other atlases, we performed a two-sided Wilcoxon signed rank 55 test between each pair of parcellations, and an effect size was reported 56 using Cohen's  measure.Similar to the procedure in (Schaefer et al., 2018; Joshi et al., 2022; Yan 66 71 variance" hereafter.Note that we compute tasks variance separately for 72 each contrast.73 When comparing parcellation schemes, we calculated the difference 74 between task variance between Untamed and its counterpart using a 75 matched number of parcels.A negative value indicates Untamed has a 76 lower variance, while a positive value indicates the counterpart has a 77 lower variance.Statistical significance was evaluated using a two-sided 78 Wilcoxon signed rank test between each pair of parcellations being

Table 1 .
List of atlases included in the comparison.Type denotes the modality of data the atlas is constructed from."Functional": the atlas is only based on fMRI; "Hybrid": the atlas is constructed from both functional MRI and anatomical information.
those of the other parcellation methods and atlases.Again, we used a 1 two-sided Wilcoxon signed rank test for statistical testing.3WecomparedourUntamedparcellations to the extensive set of 17 4 atlases listed in Table1.Because comparison of the evaluation metrics 5 defined above is only meaningful when comparing parcellations of 6 similar densities, in each case, we matched the number of parcels found 7 using Untamed to the number in the left and right hemispheres for each 8 baseline comparison.9Foratlases that were originally constructed in a space different from 10 HCP (e.g., MNI152, fsaverage), we used an atlas resampled onto the 11 HCP fs_LR 32K surfaces provided either by the original authors or a 12 third-party as listed in Table1.We note that a certain percentage of 13 cortical vertices on the resampled HCP surface for some baseline atlases 14 may not be assigned to any parcel due to either methodological 15 restriction or registration inaccuracy.In such cases, we discarded 16 vertices in Untamed where labels were missing in the baseline atlases for 17 a fair comparison.An extreme example of such cases is the atlas 27 decomposed into local contiguous parcels (51 parcels for the 7 networks 28 and 114 parcels for the 17 networks).30Wealso compared Untamed with the spectral clustering described in 31 (Ng et al., 2001), which applies a  -means clustering to the most 32 significant eigenvectors of the normalized graph Laplacian.All steps in 33 the learning procedure are identical except for the graph embedding 34 where we used NetMF.We applied both methods to the same graph 35 constructed from NASCAR spatial maps as described in section 2.4.For 36 both embedding methods, we set the spatial neighborhood constraint to 37 be 25-hop, and the embedding dimension to be 100.Both of the 38 embeddings were then separately supplied as input features for -means 39 clustering with 50 random initializations with a range of cluster numbers.40We evaluated weighted average homogeneity on the averaged RSFC 41 from the 500 training subjects in the HCP dataset.A two-sided Wilcoxon 42 signed rank test was used to compare the homogeneity values obtained 43 from the Laplacian eigenvectors used in spectral clustering and NetMF 44 utilized in Untamed.55 training subjects in the HCP dataset.A two-sided Wilcoxon signed rank 56 test was employed to statistically compare homogeneity values obtained 57 from NASCAR spatial maps and RSFC.58 https://balsa.wustl.edu/study/show/RVVG2.10 Is there a (natural) optimal number of parcels that can be identified from rsfMRI data?