## Abstract

A rapidly emerging application of network neuroscience in neuroimaging studies has provided useful tools to understand individual differences in complex brain function. However, the variability of methodologies applied across studies - with respect to node definition, edge construction, and graph measurements-makes it difficult to directly compare findings and also challenging for end users to select the optimal strategies for mapping individual differences in brain networks. Here, we aim to provide a benchmark for best practices by systematically comparing the reliability of human brain network measurements of individual differences under different analytical strategies using the test-retest design of the resting-state functional magnetic resonance imaging from the Human Connectome Project. The results uncovered four essential principles to guide reliable network neuroscience of individual differences: 1) use a whole brain parcellation to define network nodes, including subcortical and cerebellar regions, 2) construct functional connectome using spontaneous brain activity in multiple slow bands, 3) optimize topological economy of networks at individual level, 4) characterise information flow with metrics of integration and segregation.

Over the past two decades, network neuroscience has helped transform the field of neuroscience (1), providing a quantitative methodology framework for modeling brains as graphs (or networks) composed of nodes (brain regions) and edges (their connections), namely connectomics (2). The organization and topology of macro-scale brain networks can be characterized by a growing suite of connectomic measurements including efficiency, centrality, clustering, small-word topology, rich-club, etc (3–5). In parallel, resting-state fMRI (rfMRI) has opened up new avenues towards understanding the human brain function (6). In conjunction with network neuroscience, rfMRI has led to the emergence of a multidisciplinary field, functional connectomics or functional network neuroscience (FNN) (7–9), in which the brain’s intrinsic, interregional connectivity is estimated from rfMRI recordings. FNN has been widely used to investigate the system-level organization of the human brain function (10) and its relationship with individual differences (11) in developmental (12), socio-cultural (13) and clinical conditions (14).

An important topic in FNN and, indeed, any scientific discipline, is the notion of measurement reliability. In general, reliability characterises a proportion of measurement variability between different subjects relative to the overall variability including both between-subject and within-subject (i.e., random) components (15), and is commonly used to assess the consistency or agreement between measurements. However, measurement reliability can also serve as a measure of discriminablity. For example, if a measurement can sufficiently capture individual characteristics, its reliability will be higher than measurements that underestimate between-subject variability. Thus, high reliability is essential for any measurement to better differentiate a group of individuals, i.e., inter-individual differences (16). Recent studies have demonstrated that the reliability of measurements is equivalent to the fingerprint or discriminablity of the measurement under the Gaussian distribution (17) while it has well-established statistical theory and applications to psychology (i.e., psychometric theory) (18) and medicine (i.e., diagnosis theory) (19). Reliability also provides an upper bound of the measurement validity (5, 16), which cannot be readily quantified as the reliability (16). Therefore, high levels of reliability is the first and most basic requirement for quantifying individual differences in FNN. Accordingly, the optimization of measurement reliability of the individual differences can help guide FNN processing and analysis pipelines for neurodevel-opmental (20) and clinical applications (21).

Previous studies have demonstrated that many measurements made on networks estimated from rfMRI have limited reliability (22, 23). These low levels of reliability could be an indication of failure in handling individual variability at different levels (24, 25). In particular, experimental design and processing decisions related to scan duration, determining frequency range, and regressing global signal have impacts on rfMRI measurements and thus their reliability (23, 26). Although less focused on reliability, existing studies also revealed that their findings are influenced by choices of parcellation templates (27), edge construction and definition, and choice of graph metrics (28). How these decisions affect the reliability of FNN measurements deserves further investigation. These analytical choices have been implemented in different software packages (29) but can vary from one package to another in terms of their parameterization. Beyond limited examinations on reliability (30–32), a systematic investigation into the reliability of FNN measurements is warranted to guide FNN software use and analyses.

In this paper, we conducted a systematic FNN reliability analysis using the test-retest rfMRI data from the Human Connectome Project (HCP) (33). Note that the HCP imaging acquisition settings and data pre-processing have integrated various strategies to optimize the measurement reliability (22, 34). We thus analyzed the minimally pre-processed HCP rfMRI data and focused our work on four key post-processing stages: node definition, edge construction, network measurement, and reliability assessments. In the end, we propose a set of principles to guide researchers in performing reliable FNN, advancing the field-standard call for the best practices in network neuroscience. Toward an open FNN, we released all the codes and reliability data by building an online platform for sharing the data and computational resources.

## Results

A typical analysis pipeline in FNN includes steps for node definition (parcellations) and edge construction (frequency bands, connectivity estimation and filtering schemes) (Fig. 1a). To determine an optimal pipeline, our aim is to combine the most reliable strategies across different parts of the analysis by comparing the reliability of derived global network metrics. The HCP test-retest data were employed for reliability evaluation (Fig. 1b) using the intraclass correlation (ICC) statistics on the measurement reliability with five levels (36): 0 < ICC ≤ 0.2 (**slight**); 0.2 < ICC ≤ 0.4 (**fair**); 0.4 < ICC ≤ 0.6 (**moderate**); 0.6 < ICC ≤ 0.8 (**substantial**); and 0.8 < ICC < 1.0 (**almost perfect**). Our analyses produce massive amounts of reliability statistics: 524,160 ICCs. In this section we first present overall reliability assessments associated with the various analytic strategies as well as their impact on between- and within-subject variability (Fig. 1c). We then determine the optimized pipelines based on the highest reliability measurements, while documenting the derived both global and local network metrics and both their reliability and variability at an individual level. Based upon these results, we built the open resources for reliable FNN, including all the codes, reliability matrices and computation *via* an online platform (http://ibraindata.com/research/reliablenetworkneuroscience).

### Whole brain networks are more reliable than cortical networks

Elements derived from a brain parcellation (i.e., parcel) define the network nodes. Here, we evaluated reliability based on 24 different parcellation choices (Fig. 2a, see more details of these parcellations in **Online Methods**). In the following parts of the paper, we name a parcellation as *‘ParcAbbr-NumberOfParcels’* (e.g., LGP-100 or its wholebrain version wbLGP-458).

We found significant differences in ICC distributions across the 24 parcellation choices (Fig. 2b, Friedman rank sum test: *χ*^{2} = 20379.07, *df* = 23, *p* < 2.2 × 10^{−1}, effect size *W*_{Kendall} = 0.377). The mean ICCs range from slight (LGP-1000) to substantial (wbLGP-458). Given a particular parcellation and definition of nodes, we illustrate the density distribution of its ICCs under all other strategies (edge definition and metric derivation). Notably, whole-brain parcellations yield higher measurement reliability than parcellations of cerebral cortex on their own (the effect sizes > 0.65). This improvement in reliability seems not simply a bi-product of having more parcels. We chose the parcellations in which the number of parcels (400 ≤ *n* ≤ 1000) almost overlapped between the cortex and the whole brain, and found no correlation between the number of parcels and the median ICCs (*r* = −0.11, *p* = 0.7). We report the mean ICC and the number of almost perfect (noap) ICCs (≥ 0.8) as the descriptive statistics for the density distributions. The wbLGP-458 (mean ICC: 0.671; noap ICC: 519), wbLGP-558 (mean ICC: 0.671; noap ICC: 540) and The wbBNP-568 (mean ICC: 0.664; noap ICC: 511) are the three most reliable choices (see more details of the post-hoc Wilcoxon signed rank test in Table S7). Among the cortical parcellations, the LGP-500 (mean ICC: 0.362; noap ICC: 0), LGP-400 (mean ICC: 0.342; noap ICC: 0) and LGP-600 (mean ICC: 0.340; noap ICC: 0) are the three most reliable choices (Table S3).

To better understand the effect of introducing 358 subcortical parcels into the cortical parcellations, we decomposed the reliability changes into a two-dimensional representation of changes of individual variability (Fig. 2c,d). This idea was motivated by the analysis of reliability derived with individual variability (15, 16) as in Fig. 1c. For each ICC under a given parcellation choice, we calculated the related between-subject variability *V _{b}* and within-subject variability

*V*. Changes in the individual variability associated with the reliability improvements from cortical to wholebrain pipelines were plotted along with Δ

_{w}*V*and Δ

_{b}*V*as arrows. These arrows are distributed across the three quadrants (quadI: 0.94%; quadII: 59.99%; quadIII: 39.07%). We noticed that most of these arrows were distributed into the optimal quadrant where the improvements of test-retest reliability by the whole-brain parcellation choices largely attributing to the increases of between-subject variability and decreases of within-subject variability. The decreases of both between-subject and within-subject variability may also strengthen the measurement reliability (the suboptimal quadIII in Fig. 2).

_{w}### Spontaneous brain activity portrays more reliable networks in higher slow bands

Brain oscillations are hierarchically organized, and their frequency bands were theoretically driven by the natural logarithm linear law (35, 37). By analogy, rfMRI oscillations can, similarly, be partitioned into distinct frequency bands. Advanced by the fast imaging protocols (TR = 720ms), HCP test-retest data allows to obtain more oscillation classes than traditional rfMRI acquisitions (typical TR = 2s). We incorporate the Buzsáki’s framework (35, 38) with the HCP dataset using the DREAM toolbox (39) in the Connectome Computation System (29) to decompose the time series into the six slow bands (Fig. 3a):

**slow-6**(0.0069-0.0116 Hz)**slow-5**(0.0116-0.0301 Hz)**slow-4**(0.0301-0.0822 Hz)**slow-3**(0.0822-0.2234 Hz)**slow-2**(0.2234-0.6065 Hz)**slow-1**^{−}(0.6065-0.6944 Hz)

We noticed that, due to the limited sampling rate (TR), this **slow-1**^{−} only covers a small part of the full **slow-1** band (0.6065-1.6487 Hz) – we indicate this above. We also included the frequency band, **slow-emp** (0.01-0.08 Hz) for the sake of comparison, as it is covers a range commonly used in rfMRI studies. A significant effect on order (*χ*^{2} = 9283.536, *df* = 6, *p* < 2.2 × 10^{-16}, *W*_{Kendall} = 0.192) across the frequency bands was revealed based on the density distributions of ICC (Fig. 3b): slow-2, slow-1^{−}, slow-3, slow-emp, slow-4, slow-5, slow-6. Post-hoc paired tests indicated that any pairs of neighbouring bands are significantly different from one another (for more details, see Table S9–13), with measurement reliability increasing with faster frequency bands. Note, however, that slow-1^{−} (mean ICC: 0.564) did not fit into this trend, possibly due to its limited coverage of the full band. But remarkably, slow-1^{−} exhibited the largest number of almost prefect ICCs for potential reliability (noap ICC: 1746, for more details, see Figure S8). Slow-emp (mean ICC: 0.519; noap ICC: 434) contains overlapping frequencies with both slow-4 (mean ICC: 0.560; noap ICC: 441) and slow-5 (mean ICC: 0.494; noap ICC: 285), and higher ICCs than the two bands but the effect sizes are small to moderate (slow-emp vs. slow-4: 0.193; slow-emp vs. slow-5: 0.485). Slow-6 is the choice with the lowest ICCs (mean ICC: 0.331; noap ICC: 154) compared to other bands (large effect sizes: *r* > 0.57).

To visualize variation in reliability across frequency bands, we plotted a trajectory tracing reliability flow along the five full (slow-6 to 2) bands in the reliability plane, whose axes correspond to between-*versus* within-subject variability (Fig. 3c). As expected, this nonlinear trajectory contains two stages of almost linear changes of the network measurement reliability from slow to fast oscillations: whole brain versus cortex. In each case, the reliability improvements attribute to both increases of between-subject variability and decreases of within-subject variability while the improvements of whole-brain network measurement reliability were largely driven by the increased variability between subjects.

### Topological economics individualize highly reliable functional brain networks

Estimating functional connections can be highly challenging due to the absence of a ‘ground truth’ human functional connectome. To provide a reliable way of building candidate edges of the connections, we sampled the 12 schemes on graph edge filtering (Fig. 4a), which turn a fully connected matrix into a sparse graphical representation of the corresponding brain network. These schemes can be categorized into two classes: threshold-based *versus* topology-based schemes. Absolute weight thresholding (ABS_{05}), proportional thresholding (PROP_{10}, PROP_{20}), degree thresholding (DEG_{5}, DEG_{15}), overall efficiency cost optimization (ECO) and global cost efficiency optimization (GCE) commonly employ an threshold for filtering edges with higher strengths than a cut-off value. These schemes are widely used in network neuroscience and ignore the intrinsic topological structure of the entire brain network (e.g, leading to multiple connected components or isolated nodes). In contrast, topology-based schemes such as minimum spanning tree (MST), orthogonal MST (OMST), planar maximally filtered graph (PMFG) and triangulated maximally filtered graph (TMFG) come from other scientific disciplines and are optimized based on the entire network topology (41–44). To combine both the TMFG’s efficiency and OMST’s accuracy, we proposed the orthogonal TMF graph (OTMFG). All the schemes are plotted in the plane of cost *versus* global-cost efficiency (45) to better visualize the economical properties of the derived networks (Fig. 4b). These plots are fitted into the topographic (contour) maps where the local maxima for each filtering choice is labeled as a circle. The human brain networks achieve higher global efficiency with lower cost using topology-based schemes compared to thresholdbased schemes, suggesting increasingly optimal economics.

Significant differences in test-retest reliability were detectable across these 12 edge-filtering schemes (*χ*^{2} = 9784.3l7, *df* = 11, *p* < 2.2 × 10^{−16}, *W*_{Kendall} = 0.189, see Fig. 4c). Among the topology-based schemes, OMST (mean ICC: 0.608; noap ICC: 765), OTMFG (mean ICC: 0.602; noap ICC: 781) and TMFG (mean ICC: 0.570; noap ICC: 767) were the three most reliable choices. They showed significantly greater reliability than the three most reliable threshold-based, respectively: PROP_{20} (mean ICC: 0.593; noap ICC: 632), PROP_{10} (mean ICC: 549; noap ICC: 445) and GCE (mean ICC: 0.533; noap ICC: 352). Mean reliability of MST are slight to fair (mean ICC: 0.309) but its number of almost perfect reliability (noap ICC:362) is still higher than all threshold-based schemes except PROP_{10} and PROP_{20} (see more details in Figure S21).

Network measurements are labeled based on topology and threshold groups and projected onto the reliability anatomy plane, whose axes represent between- and within-subject variability (Fig. 4d). The contour maps are reconstructed for each scheme based upon the individual variability of all the related network measurements. The topologybased methods (red) showed overall higher ICCs than the threshold-based methods (blue), improvements that could be attributed to increases in between-subject variability and decreases of within-subject variability. These observations are consistent between cortex and whole brain networks while topology-based whole brain network are almost perfectly reliable (meaning almost perfect reliability, i.e., ICC ≥ 0.8).

We also explored connection transformation and edge weights, two factors included in edge filtering, the choices of connectivity transformation and weighing edges, regarding their measurement reliability. Positive (Eq.pos) (mean ICC: 0.512; noap ICC: 1,031) and exponential (Eq.exp) transformation (mean ICC: 0.509; noap ICC: 1,855) were the two most reliable choices. Comparing to the positive and absolute (Eq.abs) (mean ICC: 0.508; noap ICC: 1,050) transformation, the exponential and distance-inverse (Eq.div) (mean ICC: 0.500; noap ICC: 1,031) transformation show larger number of almost perfect ICCs (see Table S15–21). Weighted graphs are also more reliable than the binary graphs while the normalized weighted graphs demonstrated the highest ICCs, reflecting both the increased between-subject variability and decreased within-subject variability.

### Network integration and segregation can serve reliable metrics of information flow

The previous big data analysis suggests that the optimally reliable pipeline should: 1) define network nodes using a whole-brain parcellation, 2) filter the time series with higher frequency bands, 3) transform the connectivity using positive transformation, 4) construct network edges using individualized methods and normalized weights. Using the optimal pipelines, we evaluated the reliability levels of various metrics from network neuroscience and their differences across individuals. Focusing on the optimized pipeline with the highest ICCs of the various choices (wbLGP-458, slow-2, pos, OMST), we reported test-retest reliability of the measurements as well as their corresponding individual variability. In Fig. 5a, we found that the global network measurements of information segregation and integration are at the level of almost perfect reliability except for the modularity *Q* (ICC=0.46, 95% CI = [0.252,0.625]). These high-level ICCs are derived with large between-subject variability and small within-subject variability (Fig. 5b). These findings are reproducible across the other two parcellation choices (wbCABP-718, wbBNP-458).

Similar to the global metrics, shortest path length *L _{p}* and nodal efficiency

*E*exhibited the highest ICCs (almost perfect test-retest reliability) while ICCs of other nodal metrics remained less than 0.6. To visualize node-level network metrics, we reported results derived from the wbCABP-718 choice. To improve spatial contrasts of reliability, we ranked the parcels according to their ICCs and visualized the ranks in Fig. 5c. Most nodal metrics are more reliable across the 360 cortical areas than the 358 subcortical areas (Wilcoxon tests: all p-values less than 0.001, corrected for multiple comparisons). However,

_{nodal}*L*and

_{p}, E_{nodal}*B*exhibited higher across subcortical areas than cortical areas (corrected

_{c}*p*< 0.001). Across the human cerebral cortex, the right hemispheric areas demonstrated more reliable

*C*(corrected

_{p}*p*< 0.0036) than the left hemispheric areas. Interesting patterns of the reliability gradient are also observable along large-scale anatomical directions (dorsal>ventral, posterior>anterior) across the nodal metrics of information segregation and centrality. These spatial configuration profiles on the reliability reflected their correspondence on interindividual variability of these metrics, characterising the network information flow through the slow-2 band.

### Building an open resource for reliable network neuroscience

The results presented here represent huge costs in terms of computational resources (more than 1,728,000 core-hours on **CNGrid**, supported by Chinese Academy of Sciences (http://cscgrid.cas.cn). Derivations of the ICCs and their linear mixed models were implemented in **R** and **Python**. As our practice in open science, we have started to provide an online platform on the reliability assessments (http://ibraindata.com/research/reliablenetworkneuroscience/reliabilityassessment). The big reliability data were designed into an online database for providing the community a resource to search reliable choices and help the final decision-making. The website for this online database provided more details of the reliability data use (http://ibraindata.com/research/reliablenetworkneuroscience/database). Finally, we shared all the codes, figures and other reliability resources via the website for boosting reliable FNN.

## Discussions

This study examined the series of processing and analysis decisions in constructing graphical representations of brains. The focus, here, was on identifying the pipeline that generated reliable, individualized networks and network metrics. The results of our study suggest that to derive reliable global network metrics with higher inter-individual variances and lower inner-individual variances, one should use whole-brain parcellations to define network nodes, focus on higher frequencies in the slow band for time-series filtering to derive the connectivity, and use topology-based methods for edge filtering to construct sparse graphs. Regarding network metrics, multi-level or multi-modal metrics appear more reliable than single-level or single-model metrics. Derive reliable measurements is critical in network neuroscience, especially for translating network neuroscience into clinical practice, which requires precise and specific biomarkers. Based on these results, we provide four principles of reliable functional connectomics which we discuss further in this section.

### Principle I: Use a whole brain parcellation to define network nodes

The basic unit of a graph is the node. However, variability across brain parcellations can yield dissimilar graphs, distorting network metrics and making it difficult to compare findings across studies(27, 46–48). In many clinical applications (14, 21), researchers aim to identify diseasespecific connectivity profiles of the whole brain, including cortical and subcortical structures, as well as cerebellum. A recent review has raised the concern that many studies have focused on restricted sets of nodes, e.g. cortex only, called a field standard for the best practices in clinical network neuroscience (24), which requires almost perfectly reliable measurements (15, 49). Our meta-reliability assessments revealed high reliability of measurements made involving functional brain networks can be achieved, through the inclusion of high-resolution subcortical nodes. This provides strong evidences that the whole-brain node use should be part of the standard analysis pipeline for network neuroscience applications. These improvements of reliability can be attributed to increases in between-subject variability coupled with reductions in within-subject variability relative to networks of cortical regions alone. One possible neuroanatomical explanation is that distant areas of cerebral cortex are interconnected by the basal ganglia and thalamus (50) while also communicating with different regions of the cerebellum *via* polysynaptic circuits (51, 52), forming an integrated connectome. These subcortical structures have been suggested to play a role in both primary (e.g., motor) as well as higher-order function (e.g., learning and memory (53)). Studies using rfMRI have delineated the resting-state functional connectivity (RSFC) maps between these subcortical structures and cortical networks of both primary and high-order functions (54–56). A recent work revealed that inter-individual variance in cerebellar RSFC networks exceeds that of cortex (57). Meanwhile, these RSFC maps are highly individualized and stable within individuals (58–60), indicating that they possess reliable characteristics. In line with our observations, we argue that inclusion of the subcortical structures as network nodes can enhance the between-subject variability and stabilize the within-subject variability by providing a more comprehensive measurements on the entirety of the brain connectivity. Larger between-subject variability implies that the associated measurements are more recognizable between different subjects, leading to improved subject discrimination, a finding that has been demonstrated (61, 62).

### Principle II: Generate functional networks using spontaneous brain activity in multiple slow bands

It has been a common practice in RSFC research area to estimate the RSFC profile based on the low-frequency (0.01 - 0.1 Hz or 0.01 - 0.08 Hz) fMRI time series (6). However, the test-retest reliability of measurements made based on this frequency band has been limited, with ICCs less than 0.4 (see (22, 23) for systematic reviews). Other applications, however, have advocated adopting a multi-frequency perspective to examine the amplitude of brain activity at rest (63) and its network properties (64). This approach has been spurred along by recent advances in multi-banded acquisitions and fast imaging protocols, offering fMRI studies a way to examine resting-state brain activity at relatively higher frequencies that may contain neurobiologically meaningful signals (39, 65). Our study provides strong evidence of highly reliable signals across higher slow-frequency bands, which are derived with the hierarchical frequency band theory of neuronal oscillation system (35). Specifically, a spectrum of reliability increases was evident from slow bands to fast bands. This reflects greater variability of the network measurements between subjects and less measurement variability within subject between the higher and lower bands of the slow frequencies. In theory, each frequency band has an independent role in supporting brain function. Lower frequency bands are thought to support more general or global computation with long-distance connections to integrate specific or local computation, which are driven by higher slow bands based on short-distance connections (37). Our findings of high reliability (inter-individual differences) are perfectly consistent with this theory from a perspective of individual variability. Previous findings have found that high-order associative (e.g., default mode and cognitive control) networks are more reliable than the primary (e.g., somatomotor and visual) networks (16, 22, 23). Our findings offer a novel frequency-based perspective on these network-level individual differences.

### Principle III: Optimize topological economy to construct network connections at individual level

There is no gold standard on for human functional connectomes, leading to plurality of approaches for inferring and constructing brain network connections. Threshold-based methods focus on the absolute strength of connectivity, retaining connections that are above some user-defined threshold and oftentimes involve applying the same threshold to all subjects. Although this approach mitigate potential biases in network metrics associated with differences in network density, it may inadvertently also lead to decreased variability between subjects. This is supported by our result finding that thresholdbased method yield low reliability of network measurements. On the other hand, the human brain is a complex network that is also near-optimal in terms of connectional economy, balancing tradeoffs of cost with functionality (66). In line with this view, certain classes of topology-based methods for connection definition may hold promise for individualized network construction. Specifically, each individual brain optimizes its economic wiring in terms of cost and efficiency, reaching a trade-off between minimizing costs and allowing the emergence of adaptive topology. Our results demonstrate that such highly individualized functional connectomes generated by the topology-based methods are more reliable than those by the threshold-methods. This reflects the increases of individual differences in functional connectomes attributing to the optimal wiring economics at individual level. The topological optimization also brings other benefits such as ensuring that a graph forms a single connected component and preserving weak connections. Indeed, there is increasing evidence supporting the hypothesis that weak connections are neurobiologically meaningful and explain individual differences in mind, behavior and demographics as well as disorders (67–69). Weak connections in a graph may be consistent across datasets and reproducible within the same individual over multiple scan sessions and therefore be reliable. Weak connections might also play non-trivial roles in transformed versions of the original brain network, e.g. so-called “edgebased functional connectivity” (70). Among these topologybased methods, MST is the simplest and promising filtering method if computational efficiency is the priority. MST can obtain a graph with the same number of nodes and edges, and it is not sensitive to scaling effects, because its structure only depends on the order rather than the absolute values of the edges (71). Although MST loses some local network measurements due to the limited number of edges, it has some other unique metrics that can be calculated (e.g., leaf fraction, tree hierarchy). A better alternative might be TMFG which computationally very efficient and statistically robust, while the OMST and OTMFG are the most reliable choices given priority to large individual differences.

### Principle IV: Characterise information flow with network integration and segregation metrics

Functional connectomes reflect the outcome of communication processes and information flows between pairs of brain regions. How information and other signals propagate between pairs of brain regions can be assayed using network neuroscientific metrics and is essential to understanding normative connectome function and its variation in clinical settings (72). While the ground truth functional connectome is unknown (and may not exist (73)), network models can help validate the imaging-based reconstructions of human functional connectomes (1). From a perspective of individual differences, reliable FNN is the basis of achieving valid measurements of the individual differences in FNN metrics (16). Our findings indicated that both the brain network segregation and integration could be reliably measured with functional connectomics using rfMRI by the optimized pipelines. At the global level, measures of information integration, e.g. characteristic path length and efficiency, were more reliable than those of information segregation, e.g. modularity and clustering coefficient. Our results also revealed that measures of integration were more stable across different scan sessions (i.e., the test-retest) for an individual subject than the segregation measurements while the inter-individual variability are measured at the similar level for both integration and segregation metrics. At nodal level, mapping reliability of the network measurements revealed interesting spatial patterns. Specifically, we found that cortical areas were generally associated with more reliable local measurements compared to subcortical areas. This may reflect different functional roles for human cortex and subcortex. For example, the differences in reliability of path-based metrics might reflect the fact that there are more within-community paths in cortex while between-community paths are more common in subcortex. Beyond this cortical-subcortical gradient, reliability of the nodal information flow also fit the left-right asymmetry and dorsal-ventral as well as posterior-anterior gradient, implying the potential validity of individual differences in information flow attributing to evolutionary, genetic and anatomical factors (74–77). To facilitate the utility of reliable network integration and segregation metrics in FNN, we integrated all the reliability resources into an online platform for reliability queries on specific metrics of information flow (http://ibraindata.com/research/reliablenetworkneuroscience).

### Conclusion, limitations and future

Here, we adopt a big data approach to systematically explore the reliability of functional brain networks by richly sampling the parameters of various steps in the network construction and analysis pipeline. The results of this analysis provided robust experimental evidence supporting four key principles that will support reliable network neuroscience measurements and applications. These principles can serve as the base for building guidelines on the use of FNN to map individual differences. Standard guidelines are essential for improvements of reproducibility in the research practice, and our findings provide experimental resources for such standardization in future network neuroscience applications. We note, however, that while our approach was extensive, it was not exhaustive – the analytical sampling procedure could miss many other existing choices (e.g., consensus-based thresholding for the edge filtering stage). The processing decisions that yield reliable connectomes may yield the most reliable network statistics, but there may be another way to process data that yields overall a higher level of reliability in network measures. Future work can build on our study by exploring these and other choices using the online computation and evaluation platform that accompanies the present study. Of note, the measurement reliability is not the final goal but the validity, which must be considered although not easily ready for a direct examination (16). Validation (through various indirect validity assessments) on the use of the proposed principles represents a promising arena for future FNN studies (5).

## Online Methods

Using the HCP test-retest dataset, our analytic procedure implemented four post-processing stages (Fig. 1a): node definition, edge construction, network measurement and reliability assessments. Specifically, the test-retest rfMRI dataset underwent the standardized preprocessing pipeline developed by the HCP team (34). The second step defines nodes (green box) using sets of brain areas based on 24 partitions, and then extracts the nodal time series. During the third step (yellow box), individual correlation matrices are first estimated based upon the six frequency bands derived from Buzsáki’s theoretical framework on brain oscillations (35) along with the classical band widely used (0.01 - 0.08 Hz). These matrices are then converted into adjacency matrices using 4 × 12 = 48 strategies on edge filtering. In the fourth step, we performed graph analyses (blue box) by systematically calculating the brain graph metrics at global, modular and nodal scales. Finally, test-retest reliability was evaluated (red box) as ICCs with the linear mixed models.

### Test-Retest Dataset

The WU-Minn Consortium in HCP shared a set of test-retest multimodal MRI datasets of 46 subjects from both the S1200 release and the Retest release. These subjects were retested using the full HCP 3T multimodal imaging and behavioral protocol. Each subject underwent the four scans on two days (two scans per day: Rest1 versus Rest2) during the first visit and returned several months later to finish the four scans on another two days during the second visit (Fig. 1b). The test-retest interval ranged from 18 to 328 days (mean: 4.74 months, standard deviation: 2.12 months). Only 41 subjects (28 females, age range: 26-35 years; 13 males, age range: 22-33 years) had full length rfMRI data across all the eight scans, and were included in the subsequent analyses. As indicated in the literature (22, 34), rfMRI protocols used by HCP for scanning and preprocessing images have been optimized for reliability.

During the scanning, participants were instructed to keep their eyes open and to let their mind wander while fixating on a cross-hair projected on a dark background. Data were collected on the 3T Siemens Connectome Skyra MRI scanner with a 32-channel head coil. All functional images were acquired using a multiband gradient-echo EPI imaging sequence (2mm isotropic voxel, 72 axial slices, TR = 720ms, TE = 33.1ms, flip angle = 52°, field of view = 208 × 180 mm^{2}, matrix size = 104 × 90 and a multiband factor of 8). A total of 1200 images was acquired for a duration of 14 min and 24 s. Details on the imaging protocols can be found in (78).

The protocols of rfMRI image preprocessing and artifact-removal procedures are documented in detail elsewhere and generated the minimally preprocessed HCP rfMRI images. It is note that artifacts were removed using the Oxford Center for Functional MRI of the Brain’s ICA-based X-noiseifier (ICA + FIX) procedure, followed by MS-MAll for inter-subject registration. The preprocessed rfMRI data were represented as a time series of grayordinates (4D), combining both cortical surface vertices and subcortical voxels (34).

### Node Definition

A brain graph defines a node as a brain area, which is generally derived by an element of brain parcellation (parcel) according to borders or landmarks of brain anatomy, structure or function as well as an element of volume (voxel) in imaging signal acquisition or a cluster of voxels (79). Due to the high computational demand of voxelbased brain graph, in this study we defined nodes as parcels according to the following brain parcellation strategies (Fig. 2a). A surface-based approach has been demonstrated to outperform other approaches for fMRI analysis (26, 80) and thus the nodes are defined in the surface space (total 30 surface parcellation choices). We adopted a naming convention for brain parcellations as follows: *‘ParcAbbr-NumberOfParcels’* (e.g., LGP-100 or its whole-brain version wbLGP-458).

### HCP Multi-Modal Parcellation (MMP)

A cortical parcellation generated from multi-modal images of 210 adults from the HCP database, using a semi-automated approach (81). Cortical regions are delineated with respect to their function, connectivity, cortical architecture, and topography, as well as, expert knowledge and meta-analysis results from the literature (81). The atlas contains 180 parcels for each hemisphere.

### Local-Global Parcellation (LGP)

A gradient-weighted Markov Random Field model integrating local gradient and global similarity approaches produces the novel parcellations (82). The final version of LGP comes with a multi-scale cortical atlas including 100, 200, 300, 400, 500, 600, 700, 800, 900, and 1000 parcels (equal numbers across the two hemispheres). One benefit of using LGP is to have nodes with almost the same size, and these nodes are also assigned to the common large-scale functional networks (40).

### Brainnetome Parcellation (BNP)

Both anatomical landmarks and connectivity-driven information are employed to develop this volumetric brain parcellation (83). Specifically, anatomical regions defined as in (84) are parcellated into subregions using functional and structural connectivity fingerprints from HCP datasets. Cortical parcels are obtained by projecting their volume space to surface space. It is noticed that the original BNP contains both cortical (105 areas per hemisphere) and subcortical (36 areas) regions but only the 210 cortical parcels are included for the subsequent analyses.

### Whole-Brain Parcellation (wb)

Inclusion of subcortical areas has been shown unignorable influences on brain graph analyses (23, 60), and we thus also constructed brain graphs with subcortical structures in volume space as nodes by adding these nodes to the cortical brain graphs. To get a high-resolution subcortical parcellation, we adopted the 358 subcortical parcels in (85). The authors employed data of 337 unrelated HCP healthy volunteers and extended the MMP cortical network partition into subcortex. This results a set of whole-brain parcellations by combining these subcortical parcels with the aforementioned cortical parcellations, namely **wbMMP**,**wbLGP** and **wbBNP**. We noticed that the wbMMP-718 has been named by the authors of (85) as the Cole-Anticevic Brain-wide Network Partition, and we thus renamed the wbMMP-718 as wbCABP-718 for consistency.

### Edge Construction

After defining the node with each parcellation, in each parcel, regional mean time series were estimated by averaging the vertex time series at each time point. To construct an edge between a pair of nodes, their representative time series entered into the following steps in order: *band-pass filtering, inter-node connectivity transformation*, and *edge filtering*.

#### Band-Pass Filtering

Resting-state functional connectivity studies have typically focused on fluctuations below 0.08 Hz or 0.1 Hz (6, 86), and assumed that only these frequencies contribute significantly to inter-regional functional connectivity (FC) while other frequencies are artifacts (87). In contrast, however, other studies have found that specific frequency bands of the rfMRI oscillations make unique and neurobiologically meaningful contributions to resting-state functional connectivity (22, 88). More recently, with fast fMRI methods, some meaningful FC patterns were reported across much higher frequency bands (89). These observations motivate exploring a range of frequency bands beyond those typically studied in resting-state functional connectivity studies, including faster frequencies.

Buzsáki and Draguhn (35) proposed a hierarchical organization of frequency bands driven by the natural logarithm linear law. This offers a theoretical template for partitioning rfMRI frequency content into multiple bands (Fig. 3a). The frequencies occupied by these bands have a relatively constant relationship to each other on a natural logarithmic scale and have a constant ratio between any given pair of neighboring frequencies (37). These different oscillations are linked to different neural activities, including cognition, emotion regulation, and memory (37, 64, 86). Advanced by the fast imaging protocols offered by the HCP scanner, the short scan interval (TR = 720ms) allows us to obtain more oscillation classes that the traditional rfMRI method. We incorporate the Buzsáki’s framework (35, 38) with the HCP fast-TR datasets by using the DREAM toolbox (39) in the Connectome Computation System (29). It decomposed the time series into the six slow bands as illustrated in Fig. 3a.

#### Connectivity Transformation

For each scan, individual nodal representative time series were band-pass filtered with each of the six frequency bands, and another empirical frequency band, slow-emp (0.01-0.08Hz). The Pearson’s correlation *r _{ij}* ∈ [−1,1] between the filtered time series of each pair of nodes

*i*= 1,…,

*N,j*= 1,…,

*N*was calculated (

*N*is the number of nodes). These correlation values provided an estimation on the edge strengths between the two nodes, and formed a

*N*×

*N*symmetric correlation matrix

*R*= (

*r*) for each given subject, scan, parcellation, and frequency band.

_{ij}Many network metrics are not well defined for negatively weighted connections. In order to ensure that the connection weights are positive only, we applied four types of transformations to the symmetric correlation matrix: the **positive** (Eq.pos), **absolute** (Eq.abs), **exponential** (Eq.exp) and **distance-inverse** (Eq.div) functions, respectively. This avoids the negative values in the inter-node connectivity matrix *W* = (*w _{ij}*) where

*z*= tanh

_{ij}^{−1}(

*r*) is Fisher’s

_{ij}*z*–transformation.

The connectivity matrix represents a set of the node parcels and relational quantities between each pair of the nodes, and will serve as the basis of following edge filtering procedure for generation of the final brain graphs.

#### Edge Filtering

In a graph, edges represent a set of relevant interactions of crucial importance to obtain parsimonious descriptions of complex networks. Filtering valid edges can be highly challenging due to the lack of ‘ground truth’ of the human brain connectome. To provide a reliable way of building candidate edges, we sampled the following 12 schemes on edge filtering and applied them to the connectivity matrices.

### Absolute Weight Thresholding (ABS)

This approach selects those edges that exceed a manually defined absolute threshold (e.g., correlations higher than 0.5), setting all correlations smaller than 0.5 to 0 (ABS_{05}). This is a simple approach to reconstruct networks (90).

### Proportional Thresholding (PROP)

It is a common step in the reconstruction of functional brain networks to ensure equal edge density across subjects (91–93). It keeps the number of connections fixed across all individuals to rule out the influence of network density on the computation and comparison of graph metrics across groups. This approach includes the selection of a fixed percentage of the strongest conncections as edges in each individual network or brain graph. Compared to ABS, PROP has been argued to reliably separate density from topological effects (30, 94) and to result in more stable network metrics (95). This makes it a commonly used approach for network construction and analysis in disease-related studies. Here, we focused on two threshholds that are commonly reported in the literature: 10% (PROP_{10}) and 20% (PROP_{20}).

### Degree Thresholding (DEG)

The structure of a graph can be biased by the number of existing edges. Accordingly, statistical measures derived from the graph should be compared against graphs that preserve the same average degree, *K*. A threshold of the degree can be chosen to produce graphs with a fixed mean degree (e.g., *K* = 5, DEG_{5}), which is the average nodal degrees of an individual graph from a single subject’s scan. Many network neuroscience studies have taken this choice for *K* = 5 (96–99). We also include the DEG_{15} for denser graphs of the brain networks.

### Global Cost Efficiency Optimization (GCE)

Given a network with a cost *ρ*, its global efficiency is a function of the cost *E _{g}*(

*ρ*), and its GCE is

*J*(

*ρ*) =

*E*(

_{g}*ρ*) –

*ρ*. Several studies suggested that brain networks, in particular those with small-world topology, maximize their global-cost efficiency (45), i.e.,

*J*= max

^{max}_{ρ}

*J*(

*ρ*). Computationally, this scheme is implemented by looping all network costs (e.g., adding edges with weights in order) to find the

*J*(see Fig. 2b) where the corresponding edge weight was determined as the threshold for edge filtering. In this sense, GCE is an individualised and optimised version of ABS, PROP and DEG while the latter three are commonly employed with a fixed threshold for all individuals.

^{max}### Overall Efficiency Cost Optimization (ECO)

Both global and local efficiency are important graph features to characterize the structure of complex systems in terms of integration and segregation of information (100). ECO was proposed to determine a network density threshold for filtering out the weakest links (101). It maximizes an extension of *J ^{max}*, the ratio between the overall (both global and local) efficiency and its wiring cost max

_{ρ}

*J*(

^{ext}*ρ*) = (

*E*(

_{g}*ρ*) +

*E*(

_{loc}*ρ*))/

*ρ*where

*E*denotes the network local efficiency. The study (100) also demonstrated that, to maximize

_{loc}*J*, these networks have to be sparse with an average node degree

*K*≃ 3.

### Minimum Spanning Tree (MST)

This is an increasingly popular method for identifying the smallest and most essential set of connections while ensuring that the network forms a fully connected graph (102–105). The tenet of using MST is to summarize information and index structure of the graph, and thus remove edges with redundant information (41). Specifically, an MST filtered graph will contain *N* nodes connected *via N* – 1 connections with minimal cost and no loops. This addresses key issues in existing topology filtering schemes that rely on arbitrary and user-specified absolute thresholds or densities.

### Orthogonal Minimum Spanning Tree (OMST)

This topological filtering scheme was proposed recently (42) to maximize the information flow over the network *versus* the cost by selecting the connections via the OMSTs. It samples the full-weighted brain network over consecutive rounds of MST that are orthogonal to each other (see Fig. 2b). Practically, we extracted the 1st MST, and then we cleared their connections and we tracked the 2nd MST from the rest of the network connections, etc. Such an iterative procedure (stopped by the *M*th MST) can get orthogonal MSTs and topologically filter brain network by optimizing the GCE under the constrains by the MST, leading to an integration of both GCE and MST

### Planar Maximally Filtered Graph (PMFG)

The idea underneath PMFG (43) is to filter a dense matrix of weights by retaining the largest possible subgraph while imposing global constraints on the topology of the resulting network. In particular, edges with the strong connection weights are retained while constraining the subgraph to be a (spanning) tree globally. Similarly, during the PMFG construction, the largest weights are retained while constraining the subgraph to be a planar graph globally. The PMFG algorithm searches for the maximum weighted planar subgraph by adding edges one by one. The resulting matrix is sparse, with 3(*N* – 2) edges. It starts by sorting all the edges of a dense matrix of weights in non-increasing order and tries to insert every edge in the PMFG. Edges that violate the planarity constraint are discarded.

### Triangulated Maximally Filtered Graph (TMFG)

The algorithm for implementing PMFG is computationally expensive, and is therefore impractical when applied to large brain networks (44). A more efficient algorithms, TMFG, was developed that exhibited greatly reduced computational complexity compared to PMFG. This method captures the most relevant information between nodes by approximating the network connectivity matrix with the endorsement association matrix and minimizing spurious associations. The TMFG derived network contains 3-node (triangle) and 4-node (tetrahedron) cliques, imposing a nested hierarchy and automatically generates a chordal network (44, 106). Although TMFG is not widely applied in network neuroscience studies, it as been applied elsewhere and proven to be a suitable choice for modeling interrelationships between psychological constructs like personality traits (107).

### Orthogonal TMF Graph (OTMFG)

To combine both the TMFG’s efficiency and OMST’s accuracy, we propose OTMFG to maximize the information flow over the network *versus* the cost by selecting the connections of the orthogonal TMFG. It samples the full-weighted brain network over consecutive rounds of TMFG that are orthogonal to each other.

In summary, as illustrated in Fig. 4a, the 12 edge filtering schemes transform a fully weighted matrix into a sparse matrix to represent the corresponding brain network. They can be categorized into two classes: threshold-based *versus* topology-based schemes. ABS_{05}, PROP_{10}, PROP_{20}, DEG_{5}, DEG_{15}, ECO and GCE rely on a threshold for filtering and retaining edges with higher weights than the threshold. These schemes normally ignore the topological structure of the entire network and can result in isolated nodes. In contrast, the topology-based methods including MST, OMST, PMFG, TMFG and OTMFG, all consider the global network topology in determining which edges to retain. As illustrated in Fig. 4b, all the schemes are plotted in the *ρ* – *J ^{max}* plane for their network economics.

### Network Analysis

We performed graph-theory-driven network analysis by calculating several common graph-based metrics for the resulting graphs. These measures, broadly, can be interpreted based on whether the characterize the extent to which network structure allows for integrated or segregation information flow. Examples of integrative measures include average shortest path length (*L _{p}*), global efficiency (

*E*), and pseudo diameter (

_{g}*D*). Segregation measures include clustering coefficient (

*C*), local efficiency (

_{p}*E*), transitivity (

_{local}*Tr*), modularity (

*Q*), and a suite of nodal centrality measures (Appendix 1). All the metrics are calculated using functions included in the Brain Connectivity Toolbox (108). We employed

**graph-tool**(https://graph-tool.skewed.de) and

**NetworKit**(https://networkit.github.io) to achieve high performance comparable (both in memory usage and computation time) to that of a pure C/C++ library. We treated these metrics as the network measurements for subsequent reliability analysis.

### Reliability Assessments

Measurement reliability is defined as the extent to which measurements can be replicated across multiple repeated measures. Test-retest reliability is the closeness of the agreement between the results of successive measurements of the same measure and carried out under the same conditions of measurement.

#### Linear mixed models

As a group-level statistic, reliability refers to the inter-individual or between-subject variability *V _{b}* relative to the intra-individual or within-subject variability

*V*. Both the intra- and inter-individual variances can be estimated using linear mixed model (LMM). In this study, given a functional graph metric

_{w}*ϕ*, we considered a random sample of

*P*subjects with

*N*repeated measurements of a continuous variable in

*M*visits.

*ϕ*(for

_{ijk}*i*= 1,⋯,

*N*and

*j*= 1,⋯,

*M*, and

*k*= 1,⋯,

*P*) denotes the metric from the

*k*

^{th}subject’s

*j*

^{th}visit and

*i*

^{th}measurement occasions. The three-level LMM models

*ϕ*as the following equations:

_{ijk}Where *γ*_{000} is a fixed parameter (the group mean) and *p*_{0k}, *υ*_{0jk} and *e _{ijk}* are independent random effects normally distributed with a mean of 0 and variances , and . The term

*p*

_{0k}is the subject effect,

*υ*

_{0jk}is the visit effect and

*e*is the measurement residual. Age, gender and interval(Δ

_{ijk}*t*) between two visits are covariants.

#### ICC Estimation

These variances are used to calculate the test-retest reliability, which is measured by the dependability coefficient and reflects the absolute agreement of measurements. The dependability coefficient is a form of the intraclass correlation coefficient (ICC) commonly, which is the ratio of the variances due to the object of measurement versus sources of error. To avoid negative ICC values and obtain more accurate estimation of the sample ICC, the variance components in model are usually estimated with the restricted maximum likelihood (ReML) approach with the covariance structure of an unrestricted symmetrical matrix (26).

A metric with moderate to almost perfect test-retest reliability (ICC ≥ 0.4) is commonly expected in practice. The level of reliability should not be judged only based upon the point statistical estimation of ICC but its confidence intervals (CI) (109). We employed the nonparametric conditional bootstrap method for 1000 times to estimate their 95% CIs.

#### Statistics Evaluation

Our analyses can produce big data of reliability statistics including 419,328 ICCs for the global network metrics. These ICCs are grouped into four categories (parcellation, frequency band, connectivity transformation and edge filtering scheme), each of which has different choices. Given each choice of a category, we estimated its density distributions of ICCs and calculated two descriptive statistics: 1) mean ICC values, which measures the *general reliability* under the given choice; 2) number of almost perfect (noap) ICC values, which measures the *potential reliability* under the given choice.

We further perform Friedman rank sum test to evaluate whether the location parameters of the distribution of ICCs are the same in each choice. Once the Friedman test is significant, we employ the pairwise Wilcoxon signed rank test for post-hoc evaluations to compare ICCs between each pair of the distributions under different choices. The statistical significance levels are corrected with Bonferroni method for controlling the family wise error rate at a level of 0.05. We develop a method to visualize and evaluate the change of ICCs (i.e., reliability gradient) between different choices (Fig. 1c). Specifically, the reliability can be plotted as a function of *V _{b}* and

*V*in its anatomy plane (15, 16). The gradient of reliability between two choices is modeled by the vector (i.e., the black arrow), and decomposed into changes of individual variability. The systematic evaluation on the reliability of the global network metrics determines the optimal network neuroscience by combining the most reliable pipeline choices. Finally, the optimized pipeline generates the nodal metrics as well as their reliability.

_{w}## Supplementary Information

### 1 Parcellations - Node Definition

#### 1.1 CC Density distribution

#### 1.2 ICC Almost Perfect (ICCs > 0.8)

#### 1.3 Substantial or Above (ICCs > 0.6)

#### 1.4 Descriptive statistics Mean

#### 1.5 Descriptive statistics Median

#### 1.6 Friedman Test

#### 1.7 Friedman Test Effect size

#### 1.8 Paired Wilcoxon signed rank test

#### 1.9 Significance Map

### 2 Frequency Bands - Edge Construction

#### 2.1 ICC Density distribution

#### 2.2 Almost Perfect (ICCs > 0.8)

#### 2.3 Substantial or Above (ICCs > 0.6)

#### 2.4 Variability Changes

#### 2.5 Descriptive statistics Mean

#### 2.6 Descriptive statistics Median

#### 2.7 Friedman Test

#### 2.8 Friedman Test Effect size

#### 2.9 Paired Wilcoxon signed rank test

#### 2.10 Significance Map

#### 2.11 Effect size

### 3 R Tranforms - Edge Construction

Many network metrics are not well defined for negatively weighted connections. In order to ensure that the connection weights are positive only, we applied four types of transformations to the symmetric correlation matrix: the **positive** (Eq.pos), **absolute** (Eq.abs), **exponential** (Eq.exp) and **distance-inverse** (Eq.div) functions, respectively. This avoids the negative values in the inter-node connectivity matrix *W* = (*w _{ij}*) where

*z*= tanh

_{ij}^{−1}(

*r*) is Fisher’s

_{ij}*z*–transformation.

#### 3.1 ICC Density distribution

#### 3.2 Almost Perfect (ICCs > 0.8)

#### 3.3 Substantial or Above (ICCs > 0.6)

#### 3.4 Descriptive statistics Mean

#### 3.5 Descriptive statistics Median

#### 3.6 Friedman Test

#### 3.7 Friedman Test Effect size

#### 3.8 Paired Wilcoxon signed rank test

#### 3.9 Significance map

#### 3.10 Effect size

### 4 Schemes - Edge Construction

#### 4.1 ICC Density distribution

#### 4.2 Almost Perfect (ICCs > 0.8)

#### 4.3 Substantial or Above (ICCs > 0.6)

#### 4.5 Descriptive statistics Mean

#### 4.6 Descriptive statistics Median

#### 4.7 Friedman Test

#### 4.8 Friedman Test Effect size

#### 4.9 Paired Wilcoxon signed rank test

#### 4.10 Significance map

### 5 Metrics - Network Analysis

#### 5.1 Metrics

Superscript *b* (eg. *Eg ^{b}*) is used for the binary graphs, superscript

*w*(eg.

*Eg*) is used for the weighted graphs, and superscript

^{w}*n*(eg.

*Eg*) is used for the normalized weighted graphs.

^{n}#### 5.2 ICC Density distribution

#### 5.3 Almost Perfect (ICCs > 0.8)

#### 5.4 Substantial or Above (ICCs > 0.6)

#### 5.5 Descriptive statistics Mean

#### 5.6 Descriptive statistics Median

#### 5.7 Paired Wilcoxon signed rank test

#### 5.8 Significance map

#### 5.9 Effect size

### 6 More Metrics - Network Analysis

#### 6.1 ICC Density distribution

#### 6.2 Almost Perfect (ICCs > 0.8)

#### 6.3 Substantial or Above (ICCs > 0.6)

#### 6.4 Descriptive statistics Mean

#### 6.5 Descriptive statistics Median

#### 6.6 Significance map

## ACKNOWLEDGEMENTS

This work was supported in part by the Startup Funds for Leading Talents at Beijing Normal University, the National Basic Science Data Center ‘Chinese Data-sharing Warehouse for In-vivo Imaging Brain’ (NBSDC-DB-15), Beijing Municipal Science and Technology Commission (Z161100002616023, Z181100001518003), the Key-Area Research and Development Program of Guangdong Province (2019B030335001), Guangxi BaGui Scholarship (201621), and Indiana University Office of the Vice President for Research Emerging Area of Research Initiative, Learning: Brains, Machines and Children. The neuroimaging data were provided by the HCP WU-Minn Consortium, which is funded by the 16 NIH institutes and centers that support the NIH Blueprint for Neuroscience Research 1U54MH091657 (PIs: David Van Essen and Kamil Ugurbil), the McDonnell Center for Systems Neuroscience at Washington University.

## Bibliography

- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.
- 56.↵
- 57.↵
- 58.↵
- 59.
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.
- 76.
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.
- 98.
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.
- 104.
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.↵