## Abstract

Functional connectivity (FC) networks are typically inferred from resting-state fMRI data using the Pearson correlation between BOLD time series from pairs of brain regions. However, alternate methods of estimating functional connectivity have not been systematically tested for their sensitivity or robustness to head motion artifact. Here, we evaluate the sensitivity of six different functional connectivity measures to motion artifact using resting-state data from the Human Connectome Project. We report that correlation-based measures (Pearson and Spearman correlation) have a relatively high residual distance-dependent relationship with motion compared to coherence and information theory-based measures, even after implementing rigorous methods for motion artifact mitigation. This disadvantage of correlation-based measures, however, may be offset by their higher test-retest reliability and system identifiability. We highlight spatial differences in the sub-networks affected by motion with different FC metrics. Further, we report that intra-network edges in the default mode and retrosplenial temporal sub-networks are highly correlated with motion in all FC methods. Our findings indicate that the method of estimating functional connectivity is an important consideration in resting-state fMRI studies and must be chosen carefully based on the parameters of the study.

## Introduction

Ever since the initial observation of correlations in spontaneous functional magnetic resonance imaging (fMRI) blood-oxygen level dependent (BOLD) signals acquired from subjects at rest, the field of resting-state functional connectivity has grown exponentially^{1}. Functional connectivity has been used as a tool to explore large-scale features of human brain organization^{2–6}, how this organization changes over the course of development^{7–13}, and the association of such organization with individual behavior^{14,15}. However, head motion artifact is a pervasive problem in functional connectivity analysis, decreasing certainty of findings and impacting subsequent interpretations. In-scanner head movements result in structured noise that leads to the spurious identification of putative functional connections, a problem further compounded by the fact that some individuals move systematically more than others^{16}. Consequently, a number of research groups have developed statistical preprocessing methods to mitigate the impact of motion artifact; the development of such methods is an ongoing and active field of research in its own right^{17–20}.

While much effort has been directed toward developing effective denoising pipelines to mitigate motion artifact, the subsequent estimation of functional connectivity in fMRI data has remained fairly constant. Functional connectivity (FC) between brain regions is typically estimated through a Pearson correlation between the BOLD time series of two regions of interest (ROIs). However, functional connectivity is formally defined as any statistical relation between time series^{21} and there exist many other statistical methods to compute similarity between time series. A few examples include coherence methods^{22–24}, which compute similarity in frequency space, and methods based on information theory^{25–27}, which quantify the amount of shared information between signals.

The performance of different FC estimation methods has been evaluated using generative models for a variety of neurophysiological data, including simulated BOLD signals^{28,29}. These studies have typically focused on the ability of FC estimation methods to recover the underlying network structure from simulated BOLD data. Major findings include the success of full or partial correlation at recovering the underlying network structure in simulated data^{28}. It is perhaps due to these results, and the associated ease of implementation, that correlation-based methods are so popular in the field. Yet, very few studies have used real fMRI data to compare the differential sensitivity or robustness of different FC estimation methods to motion artifact. The field awaits an appraisal of different FC estimation approaches with regard to their ability to overcome the specific type of noise introduced by motion artifact in fMRI data^{16,30}.

In the present study, we used resting state fMRI data from the Human Connectome Project (HCP) to evaluate six different FC estimation methods: Pearson correlation, Spearman correlation, coherence, wavelet coherence, mutual information in the time domain, and mutual information in the frequency domain. The sensitivity of each of these methods to subject motion and their success in identifying network structure was evaluated using four benchmarks: (a) correlations of subject motion with edge weights after denoising (QC-FC correlations), (b) the distance-dependence of QC-FC correlations, (c) the degree to which canonical brain systems could be identified through modularity maximization, and (d) the extent to which the functional connectivity estimates could be reproduced in successive scans (test-retest reliability). Collectively, these efforts serve to inform our usage of FC estimation methods, and their relative strengths and weaknesses.

## Methods

In order to evaluate the differential sensitivity of different FC estimation methods to motion, we first applied common denoising pipelines to a large resting state dataset, estimated functional connectivity matrices using 6 different methods, and finally compared the performance of each of these estimates using a set of common quality control (QC) measures. Details of data preprocessing, FC estimation, and QC measures are described below.

### Data and preprocessing

In this study, we leveraged data from the S1200 release of the Human Connectome Project (HCP)^{31}, a multi-site consortium that collected extensive MRI, behavioral, and demographic data from a large cohort of over 1000 subjects. As part of the HCP protocol, subjects underwent four separate resting-state scans, which included both left-right (REST1_LR, REST2_LR) and right-left (REST1_RL, REST2_RL) phase encoding directions. All functional connectivity data analyzed in this report came from these scans.

We preprocessed the ICA-FIX resting-state data provided by the Human Connectome Project, which used 24-parameter regression followed by ICA+FIX denoising to remove nuisance and motion signals^{32,33}. In addition, we removed the mean global signal and bandpass filtered the time series from 0.009 to 0.08 Hz. Further, we did not analyze subjects for whom greater than 50% of frames had a framewise displacement above 0.2 millimeters or a derivative root mean square above 75, leaving 802 subjects from REST1_LR, 820 from REST1_RL, 798 from REST2_LR, and 790 from REST2_RL. This threshold was chosen because it is typical for analyses of functional connectivity, and we wanted our conclusions about motion and functional connectivity to apply to common analysis pipelines^{34–37}.

For each scan, we used the mean relative RMS (root-mean squared) displacement during realignment using MCFLIRT, provided by the Human Connectome Project, as our primary measure of motion. Summary statistics of the cohorts analyzed and their head motion are shown in Table 1.

From the preprocessed data, we estimated mean BOLD time series using two cortical parcellation schemes: the 333-node Gordon parcellation^{38} and the 100-node Schaefer parcellation^{35}. For all scans, the MSMAII registration was used, and the mean time series of vertices on the cortical surface (fsL32K) in each parcel was calculated.

### Description of functional connectivity measures

In the present study, we evaluated 6 different methods for estimating functional connectivity from BOLD time series data. Here we provide a brief overview of the methods evaluated.

### Pearson’s correlation coefficient

The Pearson correlation coefficient is a simple and ubiquitous method to evaluate linear correlation between two time series as the covariance between the two signals over time divided by the product of their standard deviations. The zero-order (without lag) Pearson correlation, *ρ _{ij}*, between the signals of regions

*i*and

*j*is given by

Note that *ρ _{ij}* varies in the interval [-1, 1] with positive values indicating positive correlation and negative values indicating negative correlation.

### Spearman’s rank correlation coefficient

This method evaluates the rank correlation between two time series, providing an estimate of the extent to which one time series is a monotonic function of the other. The zero-order Spearman rank correlation, *r*_{ij} between the signals of regions *i* and *j* is given by the Pearson correlation between their respective rankings,
where *R*_{i} and *R*_{j} are are the rankings of the individual time series from low to high values. Note that *r*_{ij} varies in the interval [-1, 1] with *r*_{ij} = +1 indicating a perfect monotonic relationship between 148 the time series of region *i* and the time series of region *j*.

### Mutual information (time domain)

The mutual information is a statistical measure of the shared information between two time series. The information content of a given time series *X*(*t*) can be defined through its Shannon entropy^{25,26}, which is given by
where *X*(*t*) is partitioned into *M* bins, with *ρ _{i}* representing the probability of the

*i*-th bin. Now, the joint entropy between

*X*(

*t*) and a second time series

*Y*(

*t*) is defined as where is the joint probability of

*X*=

*X*

_{i}and

*Y*=

*Y*

_{i}. The mutual information between

*X*and

*Y*is then,

In order to obtain values in the range [0, 1], we computed the normalized mutual information^{39} as

Thus, the normalized mutual information between two independent signals is 0 and has a maximum of 1 for identical signals. While the Pearson and Spearman correlation coefficients measure linear and monotonic relationships, respectively, the mutual information is a statistical measure of both linear and non-linear relationships between time series.

### Coherence

Coherence is a measure of the cross-correlation between two signals in the frequency domain. At a given frequency *λ*, the coherence between the signal of region *i* and the signal of region *j* is given by
where *f _{ij}(λ)* is the cross-spectral density between signals

*i*and

*j*, and

*f*and

_{ii}(λ)*f*are the auto-spectral densities of signal

_{jj}(λ)*i*and

*j*, respectively. Note that

*C*varies in the interval [0, 1].

_{ij}(λ)We evaluated coherence using the MATLAB toolbox for functional connectivity^{40}, in which spectral densities are calculated using Welch’s averaged, modified periodogram method.

### Wavelet coherence

Wavelet coherence is a measure of the correlation between two signals in the time-frequency space. It is calculated in a similar manner to coherence, but spectral densities are calculated by convolving time series with wavelet functions such as the Morlet wavelet function, which expand the signal in time-frequency space. We evaluated wavelet coherence using the Grinsted toolbox^{41}.

### Mutual information (frequency domain)

Mutual information can also be evaluated based on coherence in the frequency domain^{27}, defined for a given frequency range [*λ*1, *λ*2] as

With a simple transformation, a normalized mutual information in the range [0, 1] can be obtained as

We used the implementation provided in the MATLAB toolbox for functional connectivity^{40}.

### Overview of functional connectivity estimation

Using the 6 measures described above, we estimated functional connectivity between each pair of preprocessed BOLD times series, resulting in n × n matrices for each subject, where *n* is the number of parcels in the parcellation scheme. The estimated networks provide a description of interactions (edges) among brain regions (nodes) that can then be probed for various features of interest using network science^{42}.

The 6 metrics evaluated can be broadly classified into categories based on their mode of operation. Pearson, Spearman, and mutual information (time) work in the time domain, whereas coherence and mutual information (frequency) work in the frequency domain, and wavelet coherence works in time-frequency space. For all the frequency-based methods, we evaluated the average connectivity in the frequency range [0.009Hz, 0.08Hz], which is the same frequency range at which all resting-state scans were bandpass filtered. It is to be noted that while Pearson and Spearman correlations fall in the interval [-1, 1], all the other metrics fall in the interval [0, 1] (Table 2).

Correlation matrices are typically subjected to Fisher’s r-to-z transformation to normalize the range of values. Figure S1 shows that results do not change significantly when the transform is applied to the correlation matrices. Therefore, we report results in the main text without performing Fisher transforms on any of the matrices, in order to facilitate a more direct comparison between methods.

### Overview of outcome measures

We evaluated the sensitivity of each FC metric to subject motion using four benchmarks: residual QC-FC correlations, distance-dependence of QC-FC correlations, test-retest reliability, and the modularity quality index.

### Residual QC-FC correlations

Quality control-functional connectivity (QC-FC) correlations are a widely used benchmark measure to evaluate the efficacy of denoising pipelines applied in resting-state fMRI connectivity analysis^{17,18}. Here we used this benchmark to evaluate residual motion artifact for each of the functional connectivity estimates after application of a common denoising pipeline. First, we computed functional connectivity based on the 6 metrics described in the previous section, for the 333-node Gordon and 100-node Yeo parcellation schemes. We then computed the partial correlation between functional connectivity estimates for each edge and the relative mean RMS motion of each subject, controlling for subject age and sex, thus obtaining a distribution of edge-specific correlations with subject motion. From this distribution, we computed the percentage of edges for which the QC-FC correlations were statistically significant (p<0.05, no correction for multiple comparisons).

### Distance-dependence of QC-FC correlations

Motion artifact has been known to have a distance-dependent effect on FC estimates, for instance inflating the estimated strength of short-distance connections and reducing the estimated strength of long-distance connections^{16,43}. To quantify this effect, we measured the correlation between the absolute values of QC-FC correlations (see above section) and the Euclidean distance separating the centroids of the node pair associated with each edge. This correlation served as a benchmark for the distance-dependence of the residual motion artifact.

### Test-retest reliability

To evaluate the reliability of functional connectivity estimates, we computed the intra-class correlation (ICC) across different resting-state scans performed on the same subject in the HCP dataset. The intra-class correlation coefficient *ρ* is defined^{44} as
where *MS _{b}* is the between-subject mean square strength of each edge,

*MS*is the within-subject mean square strength of each edge, and

_{w}*n*is the number of scans per subject, which in this case is 4.

### System identifiability

To evaluate the possibility that more motion-resilient FC metrics might enable better detection of signals of interest, we consider the outcome measure of system identifiability^{17,45,46}. We use the term system to refer to a set of brain regions that are strongly functionally connected; and we use the phrase system identifiability to refer to the ease with which such systems can be detected from functional connectivity matrices. We employed the modularity quality index, *Q*, as a measure of system identifiability. The modularity quality index is a quantification of the extent to which a network can be subdivided into groups or modules characterized by strong intramodular connectivity and weak intermodular connectivity. Such modularity is indicative of the assortative community structure commonly found to be present in functional brain networks^{47,48}.

We estimated the modularity quality index for each subject’s network by maximizing the modularity quality function originally defined by Newman^{45} and subsequently extended to weighted and signed networks by Rubinov and Sporns^{49}, among others^{50,51}. For FC metrics resulting in weights falling within the interval [0,1], or results estimated from the absolute value of edge weights, we employed the weighted generalization of the modularity quality index. We first let the weight of a positive edge between nodes *i* and *j* be given by , and the strength of a node *i*, , be given by the sum of the positive edge weights of *i*. We denote the chance expected within-module edge weights as for positive weights where . We let the total weight, be the sum of all positive edge weights in the network. Then the weighted generalization of the modularity quality index is given by
where *M*_{i} is the community to which node *i* is assigned, and *M*_{j} is the community to which node *j* is assigned. The Kronecker delta function, *δM _{i}M_{j}*, takes on a value of 1 when

*M*

_{i}=

*M*

_{j}and a value of 0 when

*M*

_{i}≠

*M*

_{j}. The tunable structural resolution parameter,

*γ*, scales the relative importance of the expected within-module weights (the null model) and in practice, the size of the communities; smaller or larger values of

*γ*result in correspondingly larger or smaller communities. We use a Louvain-like locally greedy algorithm

^{52}as a heuristic to maximize this modularity quality index subject to a partition

*M*of nodes into communities.

For FC metrics resulting in weights falling in the interval [-1,1], specifically the Pearson and Spearman correlations, we employed the asymmetrically weighted generalization of *Q* suitable for networks containing negative weights^{49}. Specifically, we follow Rubinov and Sporns^{49} by first letting the weight of a positive edge between nodes *i* and *j* be given by , the weight of a negative edge between nodes *i* and *j* be given by , and the strength of a node *i*, , be given by the sum of the positive or negative edge weights of *i*. We denote the chance expected within-module edge weights as for positive weights and for negative weights, where . We let the total weight, , be the sum of all positive or negative edge weights in the network.

Then an asymmetric signed generalization of the modularity quality index can be written as
where *M*_{i}, *M*_{j}, *δM _{i}M_{j}*, and

*γ*are defined as above.

We examined average *Q* for each FC metric as a measure of system identifiability, as well as the partial correlation between *Q* and mean relative RMS for each subject while controlling for average network weight, age, and sex. Additionally, we addressed two potential confounds that have not been previously addressed in work examining *Q* as a measure of system identifiability: the number of communities *k* detected during modularity maximization, and the mean and distribution of edge weights in a given network (see Supplementary Methods for details).

## Results

### Characteristics of FC matrices computed using different methods

We first characterized the functional connectivity edge weights estimated using different methods. Figure 1 shows pairwise scatterplots between edge weights computed using all 6 methods. These plots show the non-linear relationships between edges estimated using correlation-based methods (Pearson and Spearman) and the other methods. Of particular interest is the mapping from negative edges in correlation-based methods to others. For instance, the weights of negative edges in Pearson matrices have an inverse relationship with the weights of edges in wavelet coherence matrices – the more negative a Pearson edge weight, the higher its wavelet coherence edge weight.

In Figure 2, FC matrices estimated using different methods are displayed as heatmaps, with canonical systems in the Gordon parcellation highlighted in the x and y color bars. Modular structure can clearly be seen in all matrices, with clean delineation of canonical systems in all matrices. Further, in the correlation-based methods, well known negative correlations are apparent, for instance between the default mode and dorsal attention systems^{53}.

### Correlation-based methods show high residual QC-FC correlations

Next, we evaluated the sensitivity of edge weights (computed using different methods) to subject motion. We used the residual QC-FC correlation benchmark, which measures the edgewise relationship between the relative mean RMS motion of each subject and their estimated edge weights, after the application of denoising pipelines.

Panels A and B of Figure 3 show the fraction of edges for which the QC-FC correlations are statistically significant (p<0.05, no correction for multiple comparisons). Correlation-based methods (Pearson and Spearman) perform poorly compared to coherence and information theory-based methods, with a relatively high fraction of edges displaying weights that are significantly associated with motion. If only the absolute values of edge weights are taken or if negative edges are set to zero, the performance of correlation-based measures improves, but is still worse than other measures (Figure S2, Figure S3). With CompCor preprocessing, wavelet coherence emerges as the best performing measure (Figure S4).

Figure 3C shows the distribution of QC-FC correlations for each FC estimation method. The distributions for correlation-based methods are wider than for other methods, confirming that more edges in correlation-based methods are significantly associated with motion.

### Motion differentially affects putative cognitive systems

Next, we analyzed the amount of motion artifact in edges connecting regions within specific putative cognitive systems. Figure 4 shows heatmaps of QC-FC correlations for all edges in the Gordon parcellation, arranged by the associated *a priori* defined systems^{38}. Heatmaps of QC-FC correlations averaged for each system pair is shown in Figure S5. We also computed pairwise inter- and intra-community QC-FC correlations and rank-ordered them by their median values. The six highest ranked inter-community QC-FC correlations are shown in Figure 5.

Our analysis reveals a number of interesting details about the differential vulnerability of brain systems to motion artifact. Edges connecting regions within a given system appear to be affected more than inter-system edges, with 58.3% of the top-ranking QC-FC correlations belonging to intra-system edges. Edges within the default mode (D-D) and the retrosplenial temporal systems (RT-RT) appear to be especially vulnerable to motion artifact regardless of FC estimation method (Figure 5, Figure S6). Notably, the fronto-parietal and auditory systems do not feature in the top six inter-system QC-FC correlations for any FC measure. Within the correlation-based FC metrics (Pearson and Spearman), edges between cingulo-opercular and visual (CO-V), and between default and visual (D-V) systems have high QC-FC correlations but are not affected as much in the other FC measures.

### Distance-dependence of motion artifact

Next, we evaluated the distance-dependence of motion artifact by measuring edgewise correlations between the Euclidean distance between nodes and the edge’s absolute QC-FC correlation value. We find that correlation-based FC metrics (Pearson and Spearman) have higher positive distance-dependence than coherence and information theory-based methods, implying that long-distance edges are more affected by motion than short-distance edges (Figure 6). With CompCor preprocessing, most measures display relatively similar negative distance-dependence (Figure S7).

### Test-retest reliability of functional connectivity

To estimate the reproducibility in functional connectivity estimates with different methods, we measured the intra-class correlation across 4 resting state scans in the HCP dataset. Panels A and B of Figure 7 show that the intra-class correlations for correlation-based measures (Pearson and Spearman) are higher than for coherence and information theory-based methods. This relatively high reliability could be due to accurate estimates of trait-like biology or could be due to a sensitivity to a highly reliable third-party variable, such as motion. To determine whether the latter could be the case, we evaluated the reliability of subject motion. We found that the intra-class correlation for relative RMS motion was also high (0.7169), indicating that motion itself is reproducible across scans. In order to separate the reproducibility of motion from reproducibility of FC edges, we re-computed the intra-class correlations for edges that were in the bottom 20% of absolute QC-FC correlation values in all 4 scans. This analysis showed that correlation-based measures still had higher intra-class correlations than other measures after mitigating the influence of reliable motion, indicating that edges estimated using correlation-based methods are more reproducible over this time scale than edges estimated using other methods (Figure 7C, D).

### System Identifiability

Finally, we examined the extent to which different metrics of FC resulted in different levels of system identifiability, or identification of the coherent intramodular structure that is commonly found in functional brain networks^{47,48}. Panels A and B of Figure 8 show that on average, our measure of system identifiability, the modularity quality index, is highest in systems estimated using a Pearson’s or Spearman’s correlation.

The high system identifiability in correlation-based metrics could be due to an accurate sensitivity to the biology of putative cognitive systems or could be due to motion impacting regions of the brain in a spatially heterogeneous manner that partially drives the data-driven identification of systems. To address these possibilities, we studied the relation between *Q* and motion. We find that the relationship between *Q* and motion is also strongest in the correlation-based metrics, even when controlling for average weight, age, and sex (Figure 8C, D). Modularity quality index estimated on networks containing only edges that were in the bottom 20 % of absolute QC-FC correlation values in all 4 scans showed that *Q* is still highest in networks estimated with Pearson or Spearman’s correlations (Figure S8). We also estimated *Q* from networks containing only the absolute values of edge weights, which reduces but does not eliminate differences in system identifiability between correlation-based metrics and other metrics (Figure S9). Finally, to ensure that differences in system identifiability were not driven by differences in the functional systems detected when maximizing the modularity quality index, we also calculated *Q* using the canonical system partition associated with each of our 2 parcellations and obtained similar results (Figure S10).

To further ensure that results were not driven by variability in edge weight distributions across FC metrics, we examined two boundary cases from prior results as the FC metrics of interest: Pearson’s correlation and wavelet coherence. We reordered the edge weight values in our *weights* matrix (see Supplementary Methods) to reflect the rank order of weights in the ordering matrix. We estimated the modularity quality index Q of these reordered matrices, and found that Q was consistently higher when the ordering matrix was derived from Pearson’s correlations (Figure 9). Taken together, these results suggest that correlation-based FC metrics consistently result in higher levels of system identifiability, and that they may better reflect the modular architecture of functional brain networks than other methods.

## Discussion

In this report, we systematically investigated the sensitivity to motion of six different FC estimation measures drawn from the correlation, coherence, and mutual information families, based on their performance on commonly used benchmarks. The context, implications, and limitations of our results are discussed below.

### Clear distinction between correlation-based FC measures and other measures

Our main finding is that correlation-based measures (Pearson and Spearman) result in a high fraction of edges significantly correlated with motion and a high distance-dependence of motion artifact compared to all other methods. These results imply that commonly used correlation-based measures are relatively susceptible to subject motion.

Head motion artifact predominantly manifests as spurious signal fluctuations in BOLD signal across multiple voxels in the brain^{16,30}. Since traditional correlation-based functional connectivity measures temporal covariance, it follows that such measures of connectivity are directly impacted by artifactual covariance introduced by head motion. On the other hand, coherence-based FC measures quantify statistical dependencies in the frequency domain, including phase locking and correlation in power spectra. We also evaluated mutual information in the frequency domain^{27}, which is an information theoretic measure of relationships in the frequency domain. Frequency-based FC measures are less likely to be influenced by short-lived temporal fluctuations in the BOLD signal. Thus, we posit that the statistical properties of correlation-based methods render them relatively more sensitive to temporal outliers introduced by head motion.

In our study, we averaged frequency-based connectivity estimates (coherence, wavelet coherence, and mutual information in frequency) within a low frequency band (0.009-0.08Hz). Although information on the power spectral properties of motion artifact is limited, some prior studies have shown that motion affects the spectral power and connectivity estimates mainly at high frequencies^{54–56}. It is therefore possible that averaging connectivity estimates within a low frequency band reduced the impact of high-frequency motion artifact in these measures. Further, if motion artifact manifested in any one given frequency, the process of averaging in multiple frequency bands may have diluted the overall impact of motion on the FC estimates.

When time series were denoised using the CompCor pipeline, wavelet coherence FC estimates had substantially lower fraction of edges significantly correlated with motion compared to other methods. With CompCor preprocessing, residual QC-FC correlations were quite high on average, meaning that denoised signals were still quite noisy. In this context, the benefit of wavelet coherence might derive from the suitability of wavelet transform methods to model long-memory or 1/f-like processes^{23}. Wavelet methods are natural optimal whitening or decorrelating filters for 1/f-like processes, which means that they are less sensitive to the slow drift type of head motion^{23,57–59}. Therefore, wavelet coherence is a good choice for estimating functional connectivity with resting-state data that is significantly contaminated by motion.

We found that the performance of correlation-based methods on the QC-FC benchmark improved when taking the absolute values of edge weights, and when negative edges were set to zero (Figure S2, S3). Analysis of fully connected complex networks with positive and negative weights can be rigorously performed^{49}. However, the interpretation of negative correlations is controversial, especially in the context of global signal regression^{60–62}. As a result, many studies omit negative edges from analyses of functional and dynamic connectivity^{63,64}. Our results indicate that omitting negative edges or taking their absolute values might also reduce the susceptibility to motion artifact.

The systems whose edges were most affected by motion differed among FC estimation methods. For instance, connections between large-scale systems such as the default mode, cingulo-opercular and visual systems were related to motion in correlation-based methods but not in other FC methods. In contrast, within-system edges in the default mode system and retrosplenial temporal cortex were strongly related to motion in all FC estimation methods. This last result deserves attention, given the large number of scientific hypotheses surrounding the brain’s default mode system^{53,65}.

The strong relationship between motion and default mode connectivity is unlikely solely due to geometry because other networks whose edges are less correlated with motion, including the frontoparietal network, are similarly distributed with anterior and posterior nodes on both medial and lateral surfaces. It is possible that individuals with specific patterns of default mode connectivity find it more difficult to remember to stay still as their minds wander^{66–68}. Retrosplenial cortex, another network that we found to have a strong relationship with motion, is often functionally integrated with the default mode network to support memory processes, but also plays a role in spatial navigation and locomotion^{69–71}. Specifically, retrosplenial cortex integrates vestibular input, which encodes head position, with visual cortex, to calibrate self-motion with visual motion signals^{72,73}. Our findings highlight the need to carefully consider the confounding effects of motion, as well as the causes of motion, while studying these systems.

Perhaps surprisingly, correlation-based measures also resulted in significantly higher system identifiability than other measures. These results held even when restricting analysis only to the 20% of edges that were least affected by motion. This observation suggests that findings of higher system identifiability in time-based correlation methods are not solely driven by motion, and that these metrics, while highly motion-sensitive, might excel in detecting coherent community structure. Alternatively, these findings may hint that the well-established finding of a modular architecture in human functional brain networks may be relatively metric-dependent^{74,75}.

The success with which correlation-based measures detect modular architecture may be due to the presence of negative edge weights or anticorrelations in these measures, which contribute to reducing inter-module connections in calculations of modularity quality. A negative edge calculated using Pearson correlation, for instance between the default mode and dorsal attention systems, reduces the overall inter-module connectivity, crystallizing the boundaries between modules. However, the same edge calculated using coherence is highly positive, increasing intermodule connectivity and obfuscating boundaries between modules. Further, taking absolute values of correlation-based edges acts as a similar transformation, converting negative edges to positive edges, reducing the modularity quality index. The presence of negative edges or anticorrelation between internal and external attention systems has been argued to reflect a functional toggle between systems^{61,76–79}. It is therefore unclear which method best captures true interactions between systems. Indeed, both could reflect complementary aspects of network dynamics if dorsal attention activation lags default mode activation at a consistent delay.

Lastly, correlation-based measures scored higher on test-retest reliability compared to other methods. While this was partly caused by the well-known reproducibility of motion itself^{80,81}, we observed similar results when restricting our analysis to the 20% of edges with the lowest absolute QC-FC correlations for all 4 scans. This additional finding suggests that correlation-based methods, while highly sensitive to motion, lead to relatively reproducible functional connectivity estimates.

### Implications for researchers

We have shown that moving away from standard correlation-based FC measures can improve the robustness of FC estimates to head motion. However, correlation-based FC measures excel at detecting community structure and are highly reliable. Our findings indicate that the FC estimation method should be chosen carefully based on the nature of the study. For instance, studies on group comparisons, where motion artifact can introduce systematic bias in connectivity estimates^{16}, could benefit from using frequency-based FC estimation methods like coherence. The appropriate choice for studies on modular brain architecture, in contrast, would likely remain correlation-based metrics.

Our results also highlight a spatial heterogeneity in the impact of motion. FC edges in the default mode and retrosplenial cortex were especially sensitive to the effects of motion. Studies that explore the fine-scale organization and function of these networks could benefit from exploring different choices of FC estimation.

Finally, correlation-based metrics are computationally less expensive than the rest of the measures reported here and would therefore be preferable for large datasets. Other metrics can be used for smaller datasets where computational time is not prohibitive.

### Limitations

It is prudent to mention several limitations of our study. First, the lack of a noise-free ground truth is a challenge while estimating the impact of motion artifact in real fMRI data. It is difficult to separate out true signal from noise in fMRI data, a challenge further complicated by findings that head motion is a stable trait, and likely related to an individual’s physiology and neural dynamics^{82,83}. Second, due to the lack of ground truth, it is necessary to rely on indirect benchmarks such as QC-FC correlations and QC-FC distance-dependence. Central to the computation of these benchmarks is the estimation of an average measure of head motion for the whole scan from realignment estimates^{16}. This average measure can miss important spatiotemporal details of motion. Future studies could use voxel-wise displacement maps to extract more detailed information about motion and its impact on FC^{55}. Third, some recent studies have highlighted the qualitative differences in motion estimates and physiological noise parameters from datasets with fast sampling rates compared to older datasets with larger TRs^{84,85}. Thus, the estimates of head motion from realignment parameters, including those used in the current study, may need to be modified in the future for datasets with smaller TRs. Fourth, we used data from the Human Connectome Project that was preprocessed using ICA-FIX. This denoising approach has been shown to be particularly effective with HCP data^{32}. In the future, it might be beneficial to investigate the effect of varying FC estimation methods with more noisy datasets with different denoising pipelines. Further, because we imposed a fairly stringent motion exclusion threshold, it is unclear whether our results generalize to samples with higher motion including pediatric, geriatric, or psychiatric samples. Finally, we did not evaluate FC estimation methods from many statistical families, including Bayes nets, Granger causality, and generalized synchronization. Further, we restricted analysis to full correlations, and did not investigate partial correlation methods. Future studies could investigate additional families of FC estimation methods omitted from this study.

### Citation diversity statement

Recent work in neuroscience and other fields has identified a bias in citation practices such that papers from women and other minorities are under-cited relative to the number of such papers in the field^{86–91}. Here we sought to proactively consider choosing references that reflect the diversity of the field in thought, form of contribution, gender, and other factors. We used automatic classification of gender based on the first names of the first and last authors^{89,92}, with possible combinations including male/male, male/female, female/male, and female/female. Excluding self-citations to the first and last authors of our current paper, the references contain 53.1% male/male, 9.9% male/female, 23.5% female/male, 13.6% female/female. We look forward to future work that could help us to better understand how to support equitable practices in science.

### Data/code availability statement

All data used in this manuscript comes from the open-source Human Connectome Project. Code associated with this manuscript is available at https://github.com/arunsm/motion-FC-metrics.git

### Disclosure of competing interests

The authors declare no competing interests

## Author contributions

Arun S. Mahadevan: Methodology, Software, Formal Analysis, Visualization, Writing – Original Draft Preparation

Ursula Tooley: Methodology, Software, Formal Analysis, Visualization, Writing – Original Draft Preparation

Maxwell P. Bertolero: Data curation, Writing – Original Draft Preparation

Allyson P. Mackey: Supervision, Writing – Reviewing and Editing

Danielle S. Bassett: Conceptualization, Supervision, Formal Analysis, Writing – Reviewing and Editing

## Acknowledgements

We thank Linden Parkes for helpful discussions. U.A.T was supported by the National Science Foundation Graduate Research Fellowship. A.P.M. was supported by a Jacobs Foundation Early Career Research Fellowship and the National Institute on Drug Abuse (1R34DA050297-01). ASM was primarily supported by the Paul G. Allen Family Foundation and the Army Research Office (Falk W911NF-18-1-0244). DSB would also like to acknowledge the John D. and Catherine T. MacArthur Foundation, the ISI Foundation, the Army Research Laboratory (W911NF-10-2-0022), the Army Research Office (Bassett-W911NF-14-1-0679, Grafton-W911NF-16-1-0474), the National Science Foundation (BCS1631550, PHY-1554488, NCS-FO-1926829), the National Institute of Mental Health (2-R01-DC-009209-11, R01-MH112847, R01-MH107235, R21-M MH-106799, R01-MH-116920), and the National Institute of Child Health and Human Development (1R01HD086888-01). The content is solely the responsibility of the authors and does not necessarily represent the official views of any of the funding agencies.

## References

- 1.↵
- 2.↵
- 3.
- 4.
- 5.
- 6.↵
- 7.↵
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.
- 68.↵
- 69.↵
- 70.
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.
- 78.
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.
- 88.
- 89.↵
- 90.
- 91.↵
- 92.↵