On the variability of dynamic functional connectivity assessment methods

Dynamic functional connectivity (dFC) has become an important measure for understanding brain function and as a potential biomarker. However, various methodologies have been developed for assessing dFC, and it is unclear how the choice of method affects the results. In this work, we aimed to study the results variability of commonly-used dFC methods. We implemented seven dFC assessment methods in Python and used them to analyze fMRI data of 395 subjects from the Human Connectome Project. We measured the pairwise similarity of dFC results using several similarity metrics in terms of overall, temporal, spatial, and inter-subject similarity. Our results showed a range of weak to strong similarity between the results of different methods, indicating considerable overall variability. Surprisingly, the observed variability in dFC estimates was comparable to the expected natural variation over time, emphasizing the impact of methodological choices on the results. Our findings revealed three distinct groups of methods with significant inter-group variability, each exhibiting distinct assumptions and advantages. These findings highlight the need for multi-analysis approaches to capture the full range of dFC variation. They also emphasize the importance of distinguishing neural-driven dFC variations from physiological confounds, and developing validation frameworks under a known ground truth. To facilitate such investigations, we provide an open-source Python toolbox that enables multi-analysis dFC assessment. This study sheds light on the impact of dFC assessment analytical flexibility, emphasizing the need for careful method selection and validation, and promoting the use of multi-analysis approaches to enhance reliability and interpretability of dFC studies.

Alzheimer's disease (AD): [14]). 56 In recent years, a variety of methodologies for assessing dFC 57 have been developed [4,15]. As the number of available 58 methods continues to grow, there is an increasing need to 59 comprehensively review these methodologies and examine 60 their relative advantages and disadvantages, as well as con-  Euclidean distance, and mutual information. We evaluated Additionally, we compared the correspondence between the inter-subject correlations of different methods to obtain the inter-subject similarity:  words, dFC similarity variability over method pairs was five 505 times higher than over subjects. 506 We used hierarchical clustering analysis on the pair-wise sim-507 ilarity matrices to investigate and summarize variation be-   Network. 623 We also compared the variability over method with the vari-  It is worth noting that in our study, we adopted the hyperpa-   In this document, we present supplementary data that offers additional insights into the implemented methods, as well as 1307 Fig. 10. The average dFC, or static FC, for each method was calculated by averaging the dFC matrices over time and subjects, and then normalizing the values by ranking them to make the results more comparable. While the overall pattern of the static FC matrices was similar across methods, there was some degree of dissimilarity observed between the matrices produced by TF compared to the other methods.

1309
Temporal variance of dFC.
1310 1311 Fig. 11. The temporal variability of dFC matrices evaluated for each method by computing the variance of the dFC matrices over time, averaged over subjects: The results were rank-normalized to make them more comparable. The resulting temporal variance matrices revealed significant differences between methods. This can be observed by comparing the high-variance and low-variance functional connections yielded by each method, indicating that functional connections that are considered dynamic by one method may not be considered as dynamic by another method.

1312
Inter-state spatial similarity in state-based methods.     More in-depth exploration of the similarity between dFC assessment methods assessments conducted in the main study.

1342
Distribution of the similarity of dFC assessment results obtained by Spearman correlation over subjects. 1343 1344 Fig. 20. The violin plots display the distribution of overall similarity between dFC matrices for each method pair obtained by Spearman correlation over all subjects. This plot show that the similarity between results of some pair of methods can have considerably diverse values in different subjects, ranging between weak to medium, even relatively strong in some cases. This is mostly the case for the similarity between TF and others. On the other hand, pairs like DHMM-SWC have a high similarity value in all subjects, and pairs such as WL-TF and WL-SW exhibit low similarity values for all subjects. The distributions were estimated using kernel density estimation (KDE) algorithm.

1345
Hierarchical Clustering Structure across subjects. Fig. 21. The five most common hierarchical clustering structures across subjects obtained using overall similarity matrix of each subject. For each structure the average and standard deviation interval of the distances are shown on the hierarchical structure. These structures occurred in 110, 144, 21, 66, and 20 subjects (out of 395) respectively. They were selected among a total of 15 existing structures across subjects, as they occurred in at least 10 subjects. The second structure, which was the most common one, is the same structure as the structure obtained using the average overall similarity of all subjects (Figure3b). The variation of structures is mostly due to variation in similarity of Time-Freq, and to a less degree, Continuous HMM with other methods.        The results indicate a considerable degree of variability across methods, with correlation values generally falling in the weak range. For example, the closest methods based on the overall similarity, such as SWC and DHMM, show weaker correlation compared to when other similarity metrics were used, and CHMM exhibits weak similarity to other methods. However, the overall intra and inter-group similarities, except for CHMM, were similar to those obtained by the overall similarity.

Spatial vs. Temporal
in the main study. It includes comparison of spatial and temporal similarity values and additional analyses with randomization tests to compare the actual similarity values, obtained from the original dFC matrices, with those obtained using dFC matrices 1394 that were randomly shuffled in time, space, or both. These randomized tests provide more insights into the temporal and spatial 1395 alignment of the dFC results obtained by the implemented methods.

1396
Spatial vs. temporal similarity of method pairs. 1397 1398 Fig. 31. Scatter plot comparing the spatial and temporal similarity of the examined dFC assessment methods. The majority of the methods show a linear relationship between their spatial and temporal similarity, with higher temporal similarity corresponding to higher spatial similarity. However, the temporal similarity generally tended to be weaker compared to spatial similarity. Notably, DHMM-SWC exhibited the highest levels of both spatial and temporal similarity, while DHMM-CHMM and CHMM-SWC had high spatial similarity but low temporal similarity. Additionally, WL-CAP, despite having an average spatial similarity, had a higher temporal similarity compared to most of the other pairs.
Comparison of actual similarities with similarity between dFC matrices with shuffled time points. Fig. 32. The violin plots display the distribution of overall similarity between dFC matrices for each method pair after 10 different random shufflings of the time points for each subject (totaling 3,950 samples). The red dots indicate the average of actual similarities over subjects. For certain pairs, such as WL-CAP, DHMM-SWC, and SW-SWC, the average values of actual similarity were significantly higher than the average similarity of shuffled dFC matrices, implying that the temporal alignment or synchrony of dFC matrices was significant. This observation for pairs such as WL-CAP also suggests that their low overall similarity is likely due to dissimilar spatial patterns rather than dissimilar temporal dynamics. Conversely, for other pairs like DHMM-CHMM, the average values of similarity were not significantly greater, suggesting that their overall similarity was mainly attributed to the similarity of their spatial patterns, or due to their smooth transitions, and that their temporal similarity did not significantly contribute to the overall similarity. The distributions were estimated using the kernel density estimation (KDE) algorithm.

1402
Comparison of actual similarities with similarity between dFC matrices with shuffled functional connections. 1403 32 | bioRχiv Torabi et al. | Variability of dFC assessment methods Fig. 33. The violin plots display the distribution of overall similarity between dFC matrices for each method pair after 10 different shuffling of regions, and hence functional connections, for each subject (totaling 3,950 samples). The red dots indicate the average of actual similarities over subjects. For all pairs the average value of similarity was greater than the average similarity of shuffled dFC matrices, implying that their spatial patterns, or FC matrices, at each time point were more similar than the average of randomly shuffled dFC matrices. For certain pairs, such as CHMM-SWC, DHMM-SWC, and DHMM-CHMM, the observed similarity values greatly exceeded those of the shuffled dFC matrices, indicating that their spatial patterns are significantly more similar than those generated randomly.
Conversely, for pairs such as TF-CAP and WL-TF, the actual similarity values were comparable to the randomized similarity values. The distributions were estimated using the kernel density estimation (KDE) algorithm.

1405
Comparison of actual similarities with similarity between dFC matrices with shuffled time points and functional 1406 connections. Fig. 34. The violin plots display the distribution of overall similarity between dFC matrices for each method pair after 10 different random shufflings of both time points and functional connections for each subject (totaling 3,950 samples). When both time points and functional connections were shuffled, both the spatial and temporal patterns of dFC matrices were removed. The red dots indicate the average of actual similarities over subjects. The results suggest that for all pairs of methods the actual similarity was greater than the average similarity of shuffled dFC matrices, implying that dFC matrices of all pairs were more similar than two dFC matrices with randomized temporal and spatial patterns. However, for certain method pairs such as TF-CAP and WL-TF, the actual similarity values were comparable to the average similarity of random dFC matrices. The distributions were estimated using the kernel density estimation (KDE) algorithm.

1409
Comparison of actual similarities with similarity between dFC matrices with random state time courses.
1410 Fig. 35. The violin plots depict the distribution of overall similarity between randomly generated dFC matrices. The randomly generated dFC matrices of each method were obtained by generating a random state time course and using the obtained spatial patterns of the method. For each subject, 10 random dFC matrices were generated by using the actual FC matrices, or spatial patterns, of each method, resulting in 3,950 samples for each method pair. For SW and TF, all spatial patterns occurring in the subject's dFC matrices were used (38 patterns). The red dots show the average of actual similarities over subjects. For certain pairs such as WL-CAP and DHMM-SWC, the average value of the actual similarity was significantly higher than the average similarity of randomly generated dFC matrices, indicating that the alignment of their dFC matrices was significantly stronger than two randomly generated dFC matrices with the same spatial patterns but random state time courses. Conversely, for pairs such as DHMM-CHMM, the same comparison suggests that any random sequence of the same spatial patterns may have a similar level of overall similarity as their actual dFC matrices. This implies that a large portion of their overall similarity stems from the similarity between their spatial patterns rather than the smoothness of transitions or temporal synchrony. The distributions were estimated using the kernel density estimation (KDE) algorithm. and time. . Their variance over time and method was calculated by computing the variance over the time axis and method axis respectively, and averaging over the other axis. The scatter plots reveal a wide range of behaviors across method pairs. For some pairs, such as DHMM-SWC and WL-CAP, the ratio between variation over method divided by temporal variation was low, while for pairs such as WL-DHMM and TF-DHMM, the ratio was closer to one.  Right: Ratios of variance over methods divided by variance over subjects across functional connections and averaged over time. To facilitate comparison, the ratios were subtracted by one (ratio − 1), with zero values (white) representing equal variance over methods and subjects. The figure highlights pairs of RSNs with functional connections that exhibited higher variability over methods (red), as well as RSNs that exhibited higher variability over subjects (blue). Prior to computing the ratios, the variance values for functional connections belonging to each RSN pair were averaged to obtain a single variance value. The plot indicates that functional connections between RSNs such as the Default Mode and Parieto-occipital Networks, Default Mode and Cingulo-Opercular Networks, as well as those within the Default Mode, FrontoParietal and Cingulo-Opercular Networks demonstrated relatively equal variance over subjects and methods. Conversely, functional connections within RSNs, such as the Visual and Auditory networks, as well as those between the Ventral Attention and Medial Parietal Networks, exhibited higher variation over methods compared to subjects.
study. Most studies employed only one of these methods and in most of the cases the choice was not justified further. In very 1433 few cases multiple (two or three) methods were used. The applications include clinical, and cognitive applications, as well 1434 as other applications such as the investigation of brain functional organization. These studies were found by searching the