Extracting interpretable signatures of whole-brain dynamics through systematic comparison

The brain’s complex distributed dynamics are typically quantified using a limited set of manually selected statistical properties, leaving the possibility that alternative dynamical properties may outperform those reported for a given application. Here, we address this limitation by systematically comparing diverse, interpretable features of both intra-regional activity and inter-regional functional coupling from resting-state functional magnetic resonance imaging (rs-fMRI) data, demonstrating our method using case–control comparisons of four neuropsychiatric disorders. Our findings generally support the use of linear time-series analysis techniques for rs-fMRI case–control analyses, while also identifying new ways to quantify informative dynamical fMRI structures. While simple statistical representations of fMRI dynamics performed surprisingly well (e.g., properties within a single brain region), combining intra-regional properties with inter-regional coupling generally improved performance, underscoring the distributed, multifaceted changes to fMRI dynamics in neuropsychiatric disorders. The comprehensive, data-driven method introduced here enables systematic identification and interpretation of quantitative dynamical signatures of multivariate time-series data, with applicability beyond neuroimaging to diverse scientific problems involving complex time-varying systems.


Introduction
The world is replete with complex systems, in which individual components exhibit localized activity that dynamically and flexibly gives rise to emergent macro-scale behaviors-from consumers participating in the global economy [1] to birds migrating as part of a flock [2].One such system of great biological interest is the brain, wherein individual neuronal populations yield local activity dynamics that collectively drive complex behaviors like decision-making and emotion [3,4].In the brain-as with other types of complex systems-there are myriad ways to represent different types of dynamical patterns, many of which involve discretizing the brain into 'regions' [5,6].Neuroimaging modalities like functional magnetic resonance imaging (fMRI) measure brain activity within voxels that are typically aggregated into brain regions, yielding a region-by-time multivariate time series (MTS).While an MTS can be analyzed using statistics derived from a vast theoretical literature on time-series analysis [7], only a limited set of methods are typically used to summarize fMRI properties.For example, most resting-state fMRI studies have examined functional connectivity (FC) between pairs of brain regions by computing the Pearson correlation coefficient [8,9]-a zero-lag cross-correlation that assumes a joint bivariate Gaussian distribution, which may not always hold with resting-state fMRI data [9,10].However, MTS representations of fMRI data can be summarized at the level of individual regions, coupling between pairs of regions across distributed networks, or higher-order interactions amongst multiple regions [11].
The choice of how to represent the rich dynamical structure in a brain-imaging dataset-e.g., intra-regional activity (properties of the fMRI signal time series for a given region) or inter-regional coupling (statistical dependence of the fMRI signal time series for two regions)-and which statistical properties, or 'features', to measure is typically made manually.Of the few studies to compare alternative time-series features for functional connectivity or (less commonly) intra-regional coupling, the scope and number of the examined statistics is limited [12][13][14].This problem of which dynamical properties to capture from a complex dynamical system, and which algorithms might best capture them, is common to many data-driven studies of complex systems-leaving open the possibility that alternative ways of quantifying dynamical structures may be more appropriate (e.g., yield better performance, clearer interpretation, or computational efficiency) for a given problem at hand.Addressing the challenge of tailoring an appropriate methodology to a given data-driven problem, 'highly comparative' feature sets have recently been developed to unify a comprehensive range of algorithms from across the time-series literature, enabling broad and systematic comparison of time-series features (that would previously have been impractical).For example, the hctsa library includes implementations of thousands of interdisciplinary univariate time-series features [15,16], and the pyspi library includes hundreds of statistics of pairwise interactions [17].These algorithmic libraries have since been applied to systematically compare feature-based representations of diverse complex systems, from arm movement during various types of exercise [17] to light curves of different types of stars [18].Emerging applications of this highly comparative time-series feature approach to neuroimaging data have highlighted novel biomarkers that are particularly informative for a given application, such as distinguishing fMRI scans obtained at rest vs while watching a film [17] or comparing neurophysiological dynamics with cortical microarchitecture [19] or structural connectivity [20].Importantly, features defined outside of the neuroimaging space (such as dynamic time warping, originally developed for spoken word recognition [21]) have been shown to usefully quantify patterns in fMRI dynamics [22][23][24], underscoring the benefit of examining a comprehensive and interdisciplinary feature set.However, prior highly comparative analyses have focused on either intra-regional activity or inter-regional coupling separately, which fails to tackle the issue of which representation is optimal for a given problem.To address this gap, here we introduce a systematic method to evaluate multiple data-driven univariate time-series features, pairwise interaction statistics, and their combination, that is generally applicable to data-driven problems involving distributed time-varying systems.In the context of fMRI MTS, combining properties of intra-regional activity and inter-regional coupling is motivated by findings that the two representations synergistically improve classification performance across a variety of clinical settings including schizophrenia [25], Alzheimer's disease [26], nicotine addiction [27], and mental stress [28].Our method is applied to resting-state fMRI for case-control classification of four neuropsychiatric disorders as an exemplary case study: schizophrenia (SCZ), bipolar I disorder (BP), attention-deficit hyperactivity disorder (ADHD), and autism spectrum disorder (ASD).Most work in this space has focused on FC based on the hypothesis that neuropsychiatric disorders arise from dysfunction across distributed networks rather than individual regions [29][30][31], though this hypothesis has yet to be evaluated systematically.While intra-regional activity has been primarily overlooked, it can yield whole-brain maps of localized disruption, aiding interpretability and opening the door for new types of questions that are inaccessible using pairwise FC, such as how a given region responds to the targeted stimulation of another region [32].
Neuroimaging-based diagnosis of psychiatric disorders is an area of great interest, as diagnosis is currently based on behavioral criteria [33] and is hindered by patient heterogeneity [34,35] as well as inter-rater reliability issues [36,37].
Open-source resting-state fMRI datasets spanning neuropsychiatric disorders like schizophrenia [38] and autism spectrum disorder [39,40] have inspired the publication of increasingly complex classifiers like deep neural networks, which employ intricate algorithmic architectures that can be challenging to interpret [41][42][43].This is at odds with the core goals of neuroimaging-based classification studies, which include: (1) unpacking the neuroanatomical and functional underpinnings of a given disorder; and (2) developing a classification framework that aids the diagnosis of future patients [33,[44][45][46][47]. Instead, we used linear support vector machine (SVM) classifiers to combine features based on clear algorithms derived from interdisciplinary time-series literature with simpler classical machine learning models, aiming to clarify the types of underlying dynamical processes that distinguish brain activity across disorders [32,48].Broadly, insights into the dynamical processes that are altered can deepen understanding of mechanisms underlying disease pathogenesis and progression, potentially yielding clinically translatable biomarkers [49,50].
The present study is accompanied by a repository that includes all code needed to reproduce analyses and visuals presented here [51].

Results
Starting with a resting-state blood oxygen level-dependent (BOLD) signal MTS, in the form of brain regions sampled over time as shown in Fig. 1A(i-ii), we quantified both intra-regional dynamics (Fig. 1A(iii)) and pairwise coupling (Fig. 1A(iv)) using interpretable time-series features derived from a rich interdisciplinary scientific literature.We compared the ability of five different ways of statistically representing localized intra-regional dynamical properties and/or inter-regional FC to classify cases from controls in each of four neuropsychiatric disorders, as schematically depicted in Fig. 1B and summarized in Table 1.To summarize intra-regional BOLD activity fluctuations, we computed 25 univariate time-series features that included the catch22 feature set [53], as schematically depicted in Fig. 1C.The catch22 features were distilled from an initial library of over 7000 candidate features [16] to concisely capture diverse properties of local dynamics, such as distributional shape, linear and nonlinear autocorrelation, and fluctuation analysis ( [53], further described in Sec.4.2.1).We also included three additional features: the mean and standard deviation (SD)-based on our previous findings that these first two moments of the BOLD time series distributions are informative on their own [54]-and fractional amplitude of low-frequency fluctuations (fALFF) as a benchmark statistic for localized dynamics [55,56].Inter-regional FC was summarized using 14 statistics for pairwise interactions (SPIs) derived as a representative subset from an initial library of over 200 candidate SPIs in the pyspi library [17], including the Pearson correlation coefficient, as shown in Fig. 1C.This set of SPIs includes statistics derived from causal inference, information theory, and spectral methods, which collectively measure a variety of coupling patterns (directed vs undirected, linear vs nonlinear, synchronous vs lagged) that paint a fuller picture of inter-regional communication patterns [12,17,22].
Five distinct representations of BOLD activity were systematically compared, as illustrated in Figs.1D(i)-(v): 1.A region [Fig.1D(i)] represents each fMRI time series using a set of features capturing 25 dynamical properties of an individual brain region.Strong classification performance in this representation suggests localized, region-specific changes in neural activity in the corresponding disorder.
2. A feature [Fig.1D(ii)] represents each fMRI time series using a set of features that captures a single dynamical property from all brain regions (82 for SCZ, BP, and ADHD, and 48 for ASD).Strong classification performance in this representation suggests that a specific property of intra-regional resting activity is altered in at least one part of the brain in the corresponding disorder.

4.
A FC [Fig.1D(iv)] represents each fMRI time series using a set of features that capture all pairs of inter-regional coupling strengths computed using a single SPI.Strong classification performance in this representation suggests that a specific type of inter-regional coupling is altered amongst one or more brain region pairs in the corresponding disorder.LA5c study [38] and the ABIDE I/II studies [39,40,52].Each of the two cohorts also included cognitively healthy controls for comparison (N = 116 for UCLA CNP, N = 578 for ABIDE).C For each type of dynamical structure that we extracted from an fMRI dataset (i.e., for each feature-based 'representation' of the data), we computed interpretable time-series features that encapsulate a diverse range of activity properties.Local dynamical properties were quantified from each brain region using a set of 25 time-series features (the catch22 feature set [53] along with the mean, SD, and fALFF).Interactions between all pairs of regions were quantified using a set of 14 statistics of pairwise interactions (SPIs) (a representative subset from the pyspi package [17]).Values for both univariate time-series features and sets of pairwise coupling strengths from SPIs were captured in one-dimensional feature vectors that encapsulate a given type of interpretable dynamical structure(s).D We evaluated case-control classification performance for each neuropsychiatric disorder using linear support vector machine classifiers fit to each of five distinct ways of representing resting-state fMRI properties: (i) all 25 univariate time-series features in an individual region, A region ; (ii) whole-brain maps of an individual time-series feature, A feature ; (iii) whole-brain maps of all 25 univariate time-series features, A uni_combo ; (iv) functional connectivity (FC) networks across all region pairs using one SPI, A FC ; and (v) A FC along with all 25 univariate time-series features computed from all brain regions (A uni_combo ), termed A FC_combo .

5.
A FC_combo [Fig.1D(v)] represents each fMRI dataset as a set of features that capture all pairs of inter-regional 2.1 Analysis A region : Characterizing localized changes to intra-regional dynamics across disorders (as A uni_combo ).If this representation out-performs A FC on its own, that would suggest the presence of complementary and disease-relevant information in both localized intra-regional activity and pairwise coupling that is not captured with FC exclusively.
In all analyses, classification performance was evaluated using cross-validated balanced accuracy measured from a series of linear support vector machine (SVM) classifiers (see Methods Sec.4.3.1 for details).Each of these analytical representations is discussed in its own subsection of the results, with the ultimate aim of comparing across these representations for each neuropsychiatric disorder.

Analysis A region : Characterizing localized changes to intra-regional dynamics across disorders
Analysis A region reduces the complexity of an individual's BOLD fMRI data down to a set of 25 time-series features that encapsulate the dynamical properties of an individual brain region, as schematically depicted in Fig. 2A.If neural activity in a single brain region is consistently disrupted across patients relative to controls, that will yield strong case-control classification performance; by applying this analysis to every single brain region individually, we can interpret the results through a spatial map of disrupted fMRI dynamics in each disorder.In this analysis, we test whether intra-regional dynamical properties of BOLD activity can meaningfully distinguish cases from controls in each of the four disorders-reasoning that disease-related changes to neural activity of a single brain region would yield strong case-control classification performance, which can be interpreted through a spatial map of disrupted fMRI dynamics across disorders.Such a pattern would contrast with previous reports that neuropsychiatric disorders like schizophrenia and ADHD are characterized by inter-regional dysfunction across distributed networks, rather than intra-regional disruptions [29,55,57,58].
Multiple brain regions exhibited dynamics that significantly distinguished cases from controls in all four disorders (defined as Holm-Bonferroni corrected p corr < 0.05): 58/82 individual regions in SCZ, 45/82 regions in BP, 18/82 regions in ADHD, and 27/48 regions in ASD, with balanced accuracies for these significant regions listed in Supplementary Table S3.Across the four clinical groups, the brain region with the highest balanced accuracy varied markedly, from 54.9% in ASD up to 72.4% in SCZ as shown in Fig. 2B-noting that the threshold for significance varied across the four disorders due to differing sample sizes (see Methods Sec.4.3.3 for details and Fig. S9 for null distributions).The spatial distribution of intra-regional differences in underlying dynamics (assessed via cross-validated balanced accuracy) is shown for each brain region in Figs 2C(i)-(iv) for SCZ, BP, ADHD, and ASD, respectively.
In SCZ, BOLD dynamics of medial occipital brain regions were the most informative of diagnosis, as shown in , which may indicate an SCZ-specific signature of disrupted intra-regional activity in these medial occipital regions.We confirmed that the classification performance across brain regions was not biased by gray-matter volume differences between cases versus controls across disorders, as had been previously posited [25] (cf.Sec.4.3.4,Fig. S2A)-additionally, there was no bias in overall region volume on classification performance (Fig. S2B).
Subcortical structures, especially the thalamus, exhibited intra-regional dynamics that significantly distinguished cases from controls in each of the disorders (noting that subcortical data were not available for the ASD dataset).For BP, the right hippocampus achieved the highest balanced accuracy of all regions (permutation test 65.6 ± 11.4%, p corr = 9 × 10 −18 , corrected across 82 brain regions), followed closely by the left caudate (64.9 ± 13.1%, p corr = 2 × 10 −15 ) and left thalamus (64.4 ± 12.6%, p corr = 7 × 10 −14 ), as indicated with darker  ii.shading in Fig. 2C(ii).Dynamical signatures of activity in the bilateral thalamus also significantly distinguished cases from controls in SCZ and ADHD, as shown in Figs 2C(i),(iii).
Fewer brain regions exhibited distinctive changes in resting-state dynamics in ADHD (18 out of 82 regions), with overall lower classification accuracies [all below 62%; cf.Fig. 2C(iii) and Table S3].It is worth noting that the ADHD sample (N = 39) was smaller than that of SCZ (N = 48), BP (N = 49), or ASD (N = 513), though inverse probability weighting still boosted classifier performance beyond predicting the majority class for the top-performing regions in ADHD (cf.Fig. S3A).For the larger ABIDE-based ASD sample [39,40], we found that more than half of the examined brain regions (27 out of 48) yielded significantly distinguishable dynamics relative to controls, but classification performance was low overall-with all balanced accuracies below 55% (cf.Fig. 2C(iv) and Table S3).
Here, we found that the dynamical signatures of multiple individual brain regions were significantly informative of diagnosis in each neuropsychiatric disorder-emphasizing some regions that are traditionally associated with psychopathology while highlighting relatively under-examined regions for future investigation.

Analysis A feature : Assessing changes in brain-wide maps of individual dynamical properties across disorders
In this section, we asked whether an individual time-series feature of intra-regional dynamics measured across all brain regions could meaningfully distinguish cases from controls.As schematically depicted in Fig. 3A Brain-wide maps of 22 out of the 25 intra-regional time-series features yielded significant balanced accuracies in at least one of the four disorders (permutation test p corr < 0.05, corrected across 25 features), with balanced accuracies for these significant features listed in Supplementary Table S4.The mean classification performance of these time-series features is shown in Fig. 3B, with the maximum balanced accuracy ranging from 58.3 ± 15.4% in ADHD up to 68.9 ± 13.2% in SCZ-reflecting a lower range than that from looking at individual brain regions in A region (Sec.2.1).We found overall higher accuracies in SCZ and BP than in ADHD or ASD, reflecting a similar trend in the performance of models based on individual brain regions (A region , see Fig. 2C).Only one time-series feature significantly distinguished cases from controls in all four disorders: a periodicity metric that measures the first local maximum of the autocorrelation function (labeled as 'periodicity' here, as described in Wang et al. [59]).The fALFF (a commonly used fMRI time-series feature) also performed well, yielding significant classification performance for SCZ, BP, and ASD (though not for ADHD)-supporting its utility as a standard statistic for comparing disease-associated changes to local BOLD activity.None of the features achieved classification accuracies over 60% for ADHD, a threshold cleared by several individual brain regions in A region (see Fig. 2B-C), suggesting that ADHD is not distinctively characterized by individual properties of intra-regional alterations to fMRI dynamics.
Across disorders, the other top-performing time-series features captured quantitatively similar properties to fALFF-namely, properties of linear autocorrelation and power spectrum structure.To confirm this, we focused on the nine time-series features that yielded significant accuracies in three or more disorders (outlined by a box in Fig. 3B) and computed a similarity index |ρ feature |, defined as the absolute Spearman rank correlation between feature values measured from all brain regions and all participants.As shown in Fig. 3C 3B).Altogether, we do not find compelling evidence that increasing the computational complexity beyond the fALFF is warranted for classification performance, suggesting that this linear and parsimonious feature is a suitable representative of this class of resting-state intra-regional dynamics over a comprehensive range of alternatives.
The BOLD SD and 'periodicity' exhibited more distinctive brain-wide values compared to the other cluster of linear properties, and also ranked among the highest-performing features across disorders, although they capture distinctive temporal properties to each other as evidenced by the |ρ feature | value of 0.09 (Fig. 3C).Notably, fALFF was less similar to 'periodicity' (|ρ feature | = 0.54) than to the other top-performing linear autocorrelation features in the larger cluster in Fig. 3C (all |ρ feature | ≥ 0.65).This lower similarity suggests that periodicity captures a different aspect of intra-regional fMRI dynamics that is more distinctively altered in ADHD compared with fALFF-although with relatively low balanced accuracy (58.3±15.4%,p corr = 2×10 −5 ).While BOLD SD has previously been reported as a powerful biomarker across case studies (including SCZ [60,61] and ASD [62]; see Discussion), Power et al.
[63] have also quantitatively linked brain-wide BOLD to head motion in the scanner, raising concern that head motion could be a confounding factor here.The two datasets (UCLA CNP and ABIDE) were preprocessed separately, which could yield differing residual motion artifacts [64], but we applied the same motion-based quality control filter to both datasets (see Methods, Sec.4.1.3)such that we do not expect motion-related artefacts in the preprocessed BOLD time series to drive case-control differences.Still, we evaluated this potential confound by comparing the BOLD SD values averaged across all brain regions with the mean framewise displacement (FD) value per participant, which revealed significant whole-brain correlations for control participants in both studies (UCLA CNP and ABIDE) as well as ADHD and ASD participants (Fig. S4A).There was no association between BOLD SD and mean FD for SCZ (Spearman's ρ = 0.19, p = 0.2), which is notable as intra-regional BOLD SD was a top-performing feature for SCZ.
Further examination within individual regions revealed significant associations between BOLD SD and mean FD for 11 brain regions in UCLA CNP and all 48 brain regions in ABIDE (cf.Fig. S4B).These associations indicate that BOLD SD is indeed related to head motion (as previously reported [65]), either reflecting incomplete nuisance signal regression and/or shared biological mechanisms driving neural signal variability and head movement.
In order to understand transdiagnostic similarities in the types of intra-regional properties that distinguish cases from controls, we computed a feature-based classification similarity score (ρ disorder ) for each pair of disorders based on the Spearman rank correlation among classification accuracies for the 25 time-series features.A high ρ disorder means that the set of 25 time-series features ranked similarly in classification performance between the two disorders.As shown in Fig. 3D, SCZ and BP exhibited the highest ρ disorder = 0.59 (p corr = 0.01, corrected across 6 pairwise comparisons), indicating that similar properties of intra-regional dynamics were informative of SCZ and BP diagnoses.Importantly, this does not imply that the features of localized activity were disrupted with a similar spatial topology between the two disorders, as SCZ and BP exhibited distinctive region-wise brain maps (in Fig. 2C).
In ASD, performance across time-series features showed positive but weaker feature-based similarity scores with SCZ (ρ disorder = 0.36, p corr = 0.3) and BP (ρ disorder = 0.32, p corr = 0.4), suggesting modest similarity in the types of intra-regional features that were informative across these disorder comparisons.By contrast, ADHD exhibited the most distinctive time-series feature performance patterns to the other three disorders, with a negative ρ disorder in each case, suggesting that the highest-performing features in the other disorders were the lowest performing features for ADHD, and vice versa.However, since only three of the 25 features yielded significant classification performance for ADHD, the negative correlation estimates are based on considerable noise-and should therefore be interpreted with caution.
In this section, A feature revealed that the fALFF and similar statistics related to linear autocorrelation performed well across disorders compared to other more complex and/or nonlinear statistics, and that similar properties of resting-state fMRI dynamics were distinctively altered in SCZ and BP.

Analysis A uni_combo : Combining multiple brain regions and temporal properties to represent whole-brain fMRI dynamics across disorders
In contrast to the previous two reduced representations of intra-regional dynamics across the brain-in either a single region (A region ) or of a single time-series property (A feature )-we next asked if a more comprehensive representation that integrates multiple time-series properties of resting-state activity measured from all brain regions could better distinguish cases from controls.An improvement in performance using this much higher-dimensional representation of the brain-wide dynamics (2050 dimensions for the UCLA CNP cohort: SCZ, BP, and ADHD, and 1200 dimensions for the ABIDE ASD cohort) would suggest that multiple properties of intra-regional activity measured across multiple spatial locations are important for characterizing a given disorder, in a way that cannot be simply reduced to a single region or single time-series feature.To investigate this question, we concatenated all intra-regional features computed from the whole brain into a single 'combination' classifier model, A uni_combo , as depicted schematically in Fig. 4A.S5 (for which the balanced accuracy was significant).
Since this combination classifier involves a considerably higher feature-space dimensionality, we asked whether the increased input feature space improved classification performance-or, alternatively, if a more parsimonious, reduced representation (A region and/or A feature ) is more suitable for a given disorder.In Mean Cross-validated Balanced Accuracy (%) iii.
Fold-wise Balanced Accuracy (%) For ADHD, there were fewer individual brain regions (18) and time-series features (3) that significantly distinguished cases from controls compared with the other disorders.We sought to test the possibility that the null performance of the A uni_combo classifier may be at least partially attributable to noise accumulation [69].As a simple test for overfitting via noise accumulation, we applied principal components analysis (PCA) to the full region × feature matrix for each disorder and the corresponding control cohort (see Fig. S5A for plots of the first two principal components).
Since classification models based on individual brain regions yielded the lowest-dimensional classifiers (A region : 25 time-series features per model), we selected the first 25 principal components (PCs) from the region × feature PCA for each case-control comparison group for consistency.Using the same classification procedure as implemented for the full region × feature matrix, we found that classifier performance with 25 PCs was comparable to that of the full feature space across all four disorders (cf.Fig. S5B).To complement this dimensionality reduction technique, we also evaluated the performance of L1-regularized SVM classifiers ('LASSO', see Methods Sec.4.3.1),finding similar or poorer results with little evidence for classification improvement (cf.Fig. S5C).While a more exhaustive search for the optimal SVM hyperparameters and/or number of PCs might improve classifier performance, our preliminary analysis suggests that the poor performance of the combination classifier in ADHD is not due to overfitting alone, but rather to high heterogeneity in a small sample size.

Analysis A FC : Comparing features of inter-regional functional connectivity across disorders
In this section, we examined the performance of classification models based on inter-regional coupling strengths between all pairs of regions, with the aim of understanding which statistical formulations of time-series dependencies are optimally suited to capture case-control differences in brain-wide FC in each of the four disorders.We compared 14 SPIs from the pyspi library [17] that capture different types of inter-regional coupling, including nonlinear, time-lagged, and frequency-based interactions, derived from literature including causal inference, information theory, and spectral methods (see Methods, Sec.4.2.2 for more details).
As depicted schematically in Fig. 5A, each SPI was computed between the BOLD time series of all region-region pairs per participant and evaluated with the same cross-validated classification procedure (see Methods, Sec.4.3.1).
We found that 11 out of the 14 SPIs yielded a statistically significant balanced accuracy in at least one of the neuropsychiatric disorders, which are shown in a heatmap in Fig. 5B and summarized in Supplementary Table S6.
The significant performance of multiple SPIs suggests that inter-regional communication patterns were disrupted across disorders in a way that pairwise FC metrics are generally well suited to capture, consistent with prior work [70,71].The maximum performance of any individual SPI ranged from 54.0 ± 8.3% in ADHD up to 69.1 ± 12.0% in SCZ, reflecting our findings in previous sections that SCZ was the most clearly distinguishable from controls while ADHD was the least distinguishable.Amongst the eleven significant SPIs, the Pearson correlation coefficient performed well, yielding some of the highest classification accuracies for SCZ (  .DI was part of a larger cluster of SPIs that included coherence magnitude and cointegration (Fig. 5C), which generally exhibited poorer classification performance across disorders, as shown in Fig. 5B.This cluster also included Φ * -an estimate for the extent of integrated information [72,73] between the corresponding pair of regions-which was the only SPI to significantly distinguish cases from controls across all four disorders, although all balanced accuracy averages were below 60%.Notably, Φ * was also the only SPI to significantly distinguish ADHD cases from controls, at 54.0 ± 8.3% (permutation test p corr = 2 × 10 −6 , corrected across 14 SPIs).However, this only slightly surpassed the Pearson correlation coefficient in ADHD (which yielded 53.2 ± 10.7% balanced accuracy, and was not significant, p corr = 0.09).The other two clusters in Fig. 5C contained the phase slope index (PSI) calculated in the time-frequency domain on its own, with transfer entropy (TE) and the PSI calculated in the frequency domain grouped together.These three SPIs yielded among the lowest balanced accuracies of the 11 SPIs depicted in Fig. 5B.
In order to understand whether similar FC metrics distinguished cases from controls well across disorders, we computed a disorder similarity index, ρ disorder , that measured the Spearman rank correlation between all 14 SPIs for the given pair of disorders-noting that we included all SPIs in this analysis for completeness.As shown in the heatmap in Fig. 5D, the relative performance of SPIs was very similar between the three disorders for which the A FC models showed significant performance: SCZ, ASD, and BP (all ρ disorder > 0.5).This suggests high similarity in the types of inter-regional coupling alterations that are relevant to these disorders.By contrast, there were minimal commonalities between ADHD and the other disorders, as expected given the null performance for all SPIs other than Φ * .
Overall, in this section, we found that several FC statistics measured across all region-region pairs distinguished cases from controls with significant balanced accuracy for SCZ, BP, and ASD.For these three disorders (SCZ, BP, and ASD), FC statistics showed largely similar classification performance, with the Pearson correlation coefficient yielding among the highest balanced accuracy.

Analysis A FC_combo : Integrating local regional dynamics with pairwise coupling into one classification model across disorders
In previous sections, we represented participant-level BOLD fMRI data using either properties of intra-regional dynamics or inter-regional FC separately.We finally asked whether a unified representation, that combines both local dynamics and pairwise coupling, would offer complementary information about resting-state fMRI activity that could better distinguish cases from controls.To test this, for each SPI, we concatenated the pairwise region-region matrix (A FC ) with the full region × univariate feature matrix (A uni_combo ) which served as classifier inputs for an A FC_combo representation of an fMRI dataset, as depicted schematically in Fig. 6A.
All 14 SPIs significantly distinguished SCZ, BP, and ASD cases from controls with the inclusion of localized intra-regional dynamics, with balanced accuracies for each A FC_combo model displayed in a heatmap in Fig. 6B and summarized in Supplementary Table S7.None of the A FC_combo models were significant for ADHD.The peak classification performance across A FC_combo models for a given disorder range from 59.4% in ASD to 71.6% in SCZ.As with our analyses based on pairwise FC alone (A FC , cf.Fig. 5B), the Pearson correlation remained a top-performing SPI.DTW and DI were also among the top SPIs for SCZ, BP, and ASD, but with slightly lower performance than the Pearson correlation coefficient across the disorders.With the inclusion of intra-regional properties, Φ * remained a top-performing SPI for ASD and rose to become a top-performing SPI for both SCZ and BP.While Φ * was the only SPI to significantly distinguish ADHD cases from controls on its own (in A FC ), its  iii. iv.
Balanced Accuracy (%) performance dropped to null with the addition of brain-wide local dynamics (A FC_combo )-with the drop in performance in a higher-dimensional space likely arising from the small, heterogeneous sample.
We next aimed to more clearly understand whether the inclusion of univariate dynamics (A uni_combo ) into an SPI-based model (A FC , based on pairwise coupling strengths) increased the model accuracy (A FC_combo ), despite involving a considerable increase in feature-space dimensionality.We compared the classification performance for each SPI on its own in the FC-based model, A FC , to that of the combination model, A FC_combo .The SPI-wise balanced accuracy for each type of model is depicted in Fig. 6C, revealing that most SPIs better distinguished cases from controls with the inclusion of brain-wide statistics of intra-regional dynamical properties.We quantified the difference in performance for each SPI using corrected t-tests ( [74], see Methods Sec  3.1 Multiple dynamical signatures of resting-state activity within individual brain regions are informative of diagnosis intra-regional dynamics (t corr = 2.9, p corr = 0.03, corrected across 14 SPIs).The addition of intra-regional time-series features did not just improve the performance of significant A FC models, but also elevated the performance of non-significant SPIs to yield strong performance.This was the case with the barycenter (a measurement of the center of mass of the BOLD signal time series between the pair of brain regions, with the barycenter magnitude reflecting the extent of dynamic coupling [75,76]), which showed significantly improved ability to distinguish cases from controls in all three disorders in A FC_combo despite null performance on its own in A FC .

Discussion
There are myriad ways in which a data analyst can extract interpretable feature-based representations of dynamical structures contained in a multivariate time series, like those measured with fMRI.From this statistical smorgasbord, a given researcher typically chooses a set of dynamical properties to study for a given problem subjectively, such as a set of pairwise Pearson correlation coupling strengths or the fALFF in a given set of brain regions.The lack of systematic methodological comparison thus makes it unclear from any given study whether these chosen dynamical properties are optimal, or whether alternative-and potentially simpler and more interpretable-statistics may outperform those reported.To address these concerns, here we introduced a systematic comparison of feature-based representations of fMRI time series, based on localized intra-regional dynamics, pairwise inter-regional coupling, and their combination, allowing us to systematically capture and compare different facets of fMRI dynamics.
Our results demonstrate the benefits of such comparison by identifying the most parsimonious types of structures that are relevant for a given application, revealing disorder-specific signatures across four neuropsychiatric groups, particularly for SCZ and BP.
Our findings broadly support the use of linear time-series analysis techniques that are commonly used for fMRI data analysis, both at the level of individual brain regions (for which methods based on linear autocorrelation were top-performing metrics) and at the level of pairwise coupling (for which contemporaneous Pearson correlation was a top-performing metric), while simultaneously identifying novel high-performing metrics that warrant future investigation (including directed information to capture asymmetric information flows, and novel 'periodicity'-based univariate time-series features).While the intra-regional dynamical properties of some individual brain regions can yield good performance, we found improvement in combining dynamical properties across the whole brain (in SCZ and BP), consistent with the view that these neuropsychiatric disorders involve disruptions to brain dynamics that are spatially distributed and multifaceted [30,50,77].Pairwise coupling strengths also served as an informative way to represent dynamical structures related to case-control differences, although our results demonstrated improvements when combining both intra-regional and inter-regional structures in a unified representation of an fMRI MTS dataset, again consistent with complex and spatially distributed disruptions.Our work provides a methodological foundation for systematically invoking representative features from a rich interdisciplinary literature on time-series analysis to determine the most appropriate way(s) of summarizing interpretable dynamical structures in MTS datasets.The approach presented here-which draws on recent algorithmic catalogs of the univariate and pairwise time-series analysis literatures [15,17]-is generalizable to a wide range of problems across neuroimaging modalities and applications, as well as to a vast array of science and industry problems in which complex time-varying systems are measured and analyzed.

Multiple dynamical signatures of resting-state activity within individual brain regions are informative of diagnosis
Our results in A region (in Results Sec. 2.1) demonstrate that statistical properties of univariate BOLD dynamics within many individual brain regions significantly distinguished neuropsychiatric patients from controls across the four disorders.The disorders differed substantially in the spatial distribution of the disruption-e.g., with widespread significant disruption in SCZ compared with relatively sparse disruption in ADHD-and in the distinguishability of these alterations-e.g., > 70% balanced accuracy in multiple regions in SCZ, but < 55% for all regions in ASD.
In fMRI studies, intra-regional activity is often overlooked in favor of inter-regional functional connectivity, in part because of the hypothesis that neuropsychiatric disorders arise from disruptions to inter-regional coupling and integration, with local dynamics alone insufficient to explain or predict diagnosis [25,78,79].These conclusions about the lack of localized disruptions are largely based on previous work that has examined regional properties based on graph theory (e.g., degree centrality or node strength for a seed region in an FC matrix) [80,81] or a handful of time-series features like the fALFF or regional homogeneity (ReHo) [82], which could miss more nuanced changes to the BOLD dynamics of a given brain region.A prevailing expectation is that fMRI is too noisy and poorly sampled in time for more sophisticated time-series analysis methods to provide additional useful information about intra-regional dynamics (e.g., that have been applied to EEG data) [83][84][85].However, prior work comparing a large number of time-series features from individual brain regions using fMRI data has suggested that venturing beyond standard linear time-series analysis methods may be useful, such as the kurtosis-related distributional-shape features highlighted in Shafiei et al. [86].Our results demonstrate that combining multiple properties of intra-regional fMRI activity from an individual brain region can meaningfully distinguish cases from controls in multiple neuropsychiatric disorders, suggesting that the breadth of features and/or the integration of multiple features into a unified statistical representation of fMRI time series can provide relevant information about disorder-specific alterations.Examining dynamics within individual brain regions enables spatial interpretability through visualizing brain-wide maps of classification performance, providing a clearly interpretable region-by-region picture of activity disruptions.This ability to characterize changes in BOLD dynamics at the level of individual regions has been key to addressing questions about regional differences in the response to spatially targeted brain stimulation [32,87].It also provides a clearer way to test molecular hypotheses about regional disruption in disorders which can be compared with rich multimodal region-level atlases spanning morphometry, cortical hierarchy, and multi-omics [86,88,89] to more deeply characterize the physiological underpinnings of disease-relevant changes in future work.
Across disorders, our findings from analysis A region recapitulated prior results of individually informative brain regions, while also identifying novel region-specific changes in local dynamics that generate new hypotheses for future work.For example, resting-state properties of medial occipital brain regions like the pericalcarine and cuneus were particularly informative for SCZ classification, consistent with previous reports of altered fALFF in medial occipital regions [90][91][92] that may relate to disrupted perceptual organization [93] and/or visual hallucinations [94].
Additionally, regions in the subcortex like the thalamus and caudate nucleus exhibited resting-state dynamics that distinguished cases from controls across all three disorders (noting that subcortical data was not available for the ABIDE cohort).This is in line with prior studies highlighting subcortical disruption in BP [95,96] and SCZ [97,98], as well as compelling evidence that subcortical (specifically, striatal) dopamine dysfunction drives antipsychotic treatment efficacy [99,100].By contrast, subcortical alterations are seldom described in ADHD, in which pairwise FC receives the most attention [101].
Looking at individual properties of intra-regional dynamics measured across the whole brain in analysis A feature (in Results Sec. 2.2), we found that properties related to linear autocorrelation (including Fourier power spectrum structure, such as fALFF) were top performers in distinguishing cases from controls across neuropsychiatric disorders (see Fig. 3C).The high classification metrics validate fALFF and related linear time-series analysis tools as useful techniques for capturing disorder-relevant dynamical properties of resting-state fMRI data.The conceptually related 'periodicity' feature (which captures the first peak in the autocorrelation function, satisfying conditions described in Wang et al. [59]) was the only statistic to significantly distinguish cases from controls in all four disorders.As this 'periodicity' has not been explored for fMRI previously (to our knowledge), future work with this statistic is warranted, with an opportunity to optimize some of its parameters for fMRI data specifically.The 'ACF_timescale' statistic was also a high-performing feature, which was highly correlated to the fALFF (as has been noted in prior work [20])-potentially detecting similar phenomena to the previously reported changes to intrinsic neural timescales in SCZ [102,103] and ASD [103,104].Some of the time-series features we evaluated were conceptually similar to these top-performing linear autocorrelation metrics but yielded null classification accuracies, demonstrating that algorithms derived from theoretically similar foundations can display quite different performance in a given application.Interestingly, a nonlinear analog to the 'ACF_timescale'-'AMI_timescale', which captures the first minimum of the automutual information function (AMI)-performed worse than its linear counterpart across disorders.This is consistent with the view that BOLD dynamics (which are noisy and sparsely sampled in time) are well approximated by a linear stochastic process such that methods aiming to capture more complex (e.g., nonlinear) 3.2 Functional connectivity can better separate cases from controls with the inclusion of local dynamics dynamical structures may not be beneficial, as recently reported in Nozari et al. [105].
The BOLD SD stood out relative to the other top-performing univariate features, particularly as it is sensitive to the raw time-series values while all other features (besides the mean) were computed after z-score normalization (to focus on underlying dynamics of the time series in a way that does not depend on the measurement scale).We found that the brain-wide BOLD SD was particularly informative of SCZ and ASD diagnosis relative to controls (see Fig. 3B), recapitulating our previous work showing that the BOLD SD outperformed other univariate time-series features for SCZ [54].Changes to BOLD signal variability have previously been described in both SCZ [60,61] and ASD [62], with potential links to aberrant sensory integration.More broadly, regional BOLD SD alterations have been reported in settings ranging from healthy aging [106,107] to Alzheimer's disease [108], and it is hypothesized that BOLD signal variance is linked to functional integration [109] as well as numerous molecular and cytoarchitectural properties [110].Despite its widespread utility and reliability as a resting-state fMRI statistic [111], our observed association of SD with head movement (a non-neural confound) makes us cautious about interpreting BOLD SD-related changes as being driven by neural activity; this warrants further investigation in future work.
When we combined intra-regional dynamical properties measured across the entire brain in the combination model A uni_combo (in Results Sec. 2.3), we found that this comprehensive representation was particularly useful for classifying SCZ and BP cases relative to controls (see Figs 4B,C) The high classification performance of the combination classifier suggests that SCZ and BP are characterized by multiple types of changes to resting-state activity across multiple locations in the brain.Given the demonstrated biological heterogeneity in neuropsychiatric disorders (particularly SCZ) [112], effective exploration of larger time-series feature spaces could identify subtypes within a given disorder with distinctive dynamical profiles [113]-establishing the foundation for data-driven nosology [114].By contrast to the expanded representations for SCZ and BP, reduced representations (i.e., individual brain regions or individual time-series features) better distinguished ADHD and ASD cases from controls than this combination classifier (see Fig. 3B), pointing to greater inter-individual heterogeneity relative to the (smaller) sample size for ADHD.In ASD, the BOLD SD was the dominant high-performing feature among the 25 we compared, emerging as the top-performing metric in 99 out of 100 training validations-which suggests that the other univariate time-series features contributed less disease-relevant information than the SD, demonstrating that systematic comparison can uncover potential model simplifications to select a parsimonious and interpretable feature-based representation.

Functional connectivity can better separate cases from controls with the inclusion of local dynamics
Through our comparison of SPIs in A FC (see Results Sec. 2.4), we identified different ways of quantifying inter-regional coupling that were informative of case-control differences across the four psychiatric disorders.Out of the 14 evaluated SPIs, we found that the Pearson correlation coefficient was a top-performing statistic across disorders, particularly with the inclusion of whole-brain intra-regional dynamical properties in A FC_combo (see Results

Sec. 2.5)
. Taken together with the strong case-control classification performance of fALFF measured across the whole brain, this suggests that linear time-series analysis methods are overall well-suited for capturing the salient dynamical properties of resting-state BOLD MTS [20,86,115].Our comparisons also highlighted interesting alternative statistics with different behavior but similarly high performance to the Pearson correlation coefficient, such as DI [116,117] and Φ * [72,73]-which involve more conceptually and computationally complex formulations of pairwise interactions that are seldom applied to fMRI datasets.While there was little evidence that increasing the complexity in an FC measurement beyond the Pearson correlation coefficient substantially improved neuropsychiatric disorder classification here, we previously found that alternative metrics like DI out-performed the Pearson correlation coefficient in an fMRI-based problem [17].The alternative FC metrics examined here could be explored in future work using data with higher temporal resolution and signal-to-noise ratios (e.g., from MEG [19,118])-a setting in which we might be able to detect more complex types of interactions in the brain (e.g., that are nonlinear and/or time-lagged).Of note, the Pearson correlation coefficient is undirected and therefore yielded half as many input attributes to the linear SVM compared with directed SPIs like DI (somewhat complicating a fair comparison between SPIs), in which case dimensionality reduction and/or feature selection techniques may be beneficial.with significant balanced accuracy when we also included brain-wide maps of all 25 intra-regional time-series features-with several significant improvements from the FC matrix on its own after correcting for repeated cross-validation and multiple comparisons.This finding supports previous work indicating that combining local and pairwise properties of resting-state fMRI data can synergistically improve classification performance [25][26][27][28].

SCZ and BP were more accurately distinguished from control participants than ADHD or ASD
It is difficult to directly compare balanced accuracy values across disorders given differences in sample sizes and acquisition sites (both of which are likely to attenuate out-of-sample classification performance [69]).However, we observed that, across all representations, SCZ and BP cases were distinguished from controls with higher balanced accuracy, while ASD and ADHD cases were more difficult to classify.By systematically comparing the same set of intra-and inter-regional time-series features across neuropsychiatric disorders, we quantified the degree of similarity in feature performance between pairs of disorders.SCZ and BP exhibited high similarity in the types of time-series features that were most informative of diagnosis, suggesting similar types of changes to resting-state fMRI dynamics-with previous work suggesting BP and SCZ share both symptom profiles and multi-omic alterations [119,120].
While previous work suggests that ADHD also shares etiology with SCZ and BP based on self-reported personality traits and symptom domains [121], we found that ADHD was quite dissimilar to the other disorders in the types of features that distinguished cases from controls, particularly for localized intra-regional properties.We observed low classification performance in the ADHD group relative to SCZ and BP from the UCLA CNP cohort, which has previously been (partially) attributed to its smaller sample size [80].However, the overall low classification performance contrasts with previous reports of higher classification performance in a smaller disorder class-which is partially due to various sources of bias [122,123].This inverse relationship between sample size and classification performance has also been reported in the ABIDE dataset previously [66-68, 124, 125].There are a few potential explanations for why ADHD and ASD were less distinguishable from healthy controls than SCZ or BP in general.
We included ASD and control participants from all ABIDE sites and did not account for imaging site in our classification analyses, following a recent meta-analysis reporting a lack of site-specific effects on classification performance [126]-although future work could evaluate the effect of multi-site integration techniques such as ComBat harmonization [127,128] on intra-and inter-regional feature performance.Another difference with the ASD group relative to the other disorders is that homotopic region pairs were consolidated into one bilateral region, which could obscure hemisphere-specific changes to localized dynamics in ASD [129,130].It is possible that these populations exhibited changes to neural functional architecture that were more heterogeneous and individual-specific [123,131,132], making it difficult to identify an effective SVM decision boundary across representations.It is also possible that the ADHD and ASD samples generally included individuals with less severe symptoms than SCZ or BP cases [133], although we did not incorporate questionnaire responses or neuropsychological assessment data in the scope of this analysis.Alternatively, it could be that the features of intra-and inter-regional dynamics we examined were not sensitive to changes in fMRI dynamical structure that occur in ADHD and/or ASD.

Methodological limitations and considerations for future work
The findings presented here should be considered in the context of methodological limitations, described in following.
While the goal of this study was to compare representations without optimizing classifier hyperparameters or performing feature selection, the balanced accuracy estimates reported here are lower than those reported in some other studies with the same datasets for SCZ [25,134] and ASD [126].As our main aim was to demonstrate how different types of interpretable dynamical structures could be compared systematically, we chose to use linear SVM classifiers to focus on simplicity.However, with such large feature spaces when multiple features are compared across thousands of brain-region pairs, alternative regularization may be beneficial-which could be accomplished through nested cross-validation for hyperparameter tuning and/or different regularization approaches, Several types of feature-based representations of fMRI time series-e.g., A uni_combo , A FC , and A FC_combo -involved a high-dimensional feature space that exceeded the number of samples, a setting referred to as the 'p ≫ n' problem [137].For example, while the combined region × feature matrix evaluated in the combination classifier A uni_combo encapsulated these reduced representations, increasing the number of input attributes to a classifier does not necessarily improve its performance-depending on the 'latent dimensionality' of the dataset, defined as the number of meaningful variables inferred from the data that capture underlying essential patterns [138,139].While the two dimensionality reduction techniques we evaluated (PCA and LASSO regularization) did not improve classification performance for ASD or ADHD, this was not an exhaustive investigation.Future work could systematically evaluate ways to reduce (spatial) dimensionality in a biologically informative manner (e.g., condensing region pairs into canonical functional networks [140] or applying similarity network fusion [141,142]).It may also consider relaxing the picture of interacting spatially localized and contiguous brain regions, towards spatially distributed modes, extracted as components through dimensionality reduction [143,144] or geometrically [145,146].
Of note, previous work suggests that heterogeneous input data types (e.g., multiple brain regions, local dynamics versus pairwise coupling) require more nuanced data fusion techniques to be properly combined into one classifier [147].Future applications of this systematic framework might consider such data fusion techniques, which are amenable to the inclusion of multimodal connectivity networks, as described in Hansen et al. [148], for example.
While beyond the scope of this study, combining data fusion with other dimensionality reduction techniques can mitigate collinearity and enable accurate interpretation of linear SVM coefficients to clarify which brain regions/networks and time-series properties contribute more to a given decision boundary.More broadly, techniques to distill down the feature space will be of great utility to analyzing complex system MTS in general, both in terms of computational demand and mitigating noise accumulation.
Here we focused on five key ways of extracting intra-regional and inter-regional features from an fMRI dataset, but many other representations of spatiotemporal data could also be compared in future work.We supplied all edge weights between all regions directly as input attributes to the classifiers, but future work could reconfigure this representation to combine inter-regional graph network measures for a given seed region with its intra-regional dynamics.This includes quantifying properties of the networks defined by pairwise FC matrices (cf.highly comparative graph analysis [149]), statistics of spatiotemporal patterns (like spirals and traveling waves [150,151]), and higher-order (beyond pairwise) dependence structures [152].We also focused here on representative subsets of the univariate time-series analysis literature (the catch22 subset of over 7000 features in the hctsa feature library [16]) and pairwise dependence literature (14 representative SPIs from the full pyspi library of over 200 SPIs [17]).
Note that neither the univariate features nor the SPIs measured here were specifically tailored for neuroimaging applications; for example, the catch22 set was derived based on classification performance across 128 diverse univariate time series datasets ranging from beef spectrograms to yoga poses [153].Future work could consider ways to incorporate the full sets of local and pairwise time-series properties or to derive reduced subsets with algorithmic approaches tailored to a particular dataset.Broadening the scope of comparison in these ways-of both types of dynamical structures and algorithms for quantifying them-comes with associated issues of statistical power required to reliably pin down specific effects that future work will need to carefully consider.

Methods
An overview of our methodology is illustrated in Fig. 1.We extracted the BOLD time series from each resting-state fMRI volume, which were analyzed at the level of individual brain regions and pairs of brain regions (Fig. 1A).We included participants from four neuropsychiatric disorder groups derived from two main cohort studies, as depicted in Fig. 1B and summarized in Table 1.For each participant, we computed a set of time-series features reflecting the local dynamics of a single region's BOLD time series and coupling between a pair of brain regions (Fig. 1C).We compared the performance of these univariate and pairwise features separately, and in combination, using linear classifiers for each disorder-specifically testing five different ways of capturing dynamical structure, as shown in Fig. 1D and outlined in the following.

UCLA CNP
Raw BOLD resting-state fMRI volumes consisting of 152 frames acquired over 304 seconds were obtained from the University of California at Los Angeles (UCLA) Consortium for Neuropsychiatric Phenomics (CNP) LA5c Study [38].This included cognitively healthy control participants as well as participants diagnosed with schizophrenia (SCZ), attention-deficit hyperactivity disorder (ADHD), and bipolar I disorder (BP).Details of diagnostic criteria and behavioral symptoms have been described previously, along with imaging acquisition details [38].Imaging data were preprocessed using the fmriprep v1.1.1 software [154] and the independent component analysis-based automatic removal of motion artifacts (ICA-AROMA pipeline [155], as described previously [156]).Additional noise correction was performed to regress out white matter, cerebrospinal fluid, and global gray-matter signals using the ICA-AROMA + 2P + GMR method [156][157][158].Time series of 152 samples in length were extracted from the noise-corrected BOLD resting-state fMRI volumes for 68 cortical regions [159] and 14 subcortical regions [160] spanning the left and right hemispheres.

ABIDE
Preprocessed BOLD resting-state fMRI time-series data consisting of 180 time points were obtained from the Autism Brain Imaging Data Exchange (ABIDE) I & II consortium [39,40], working with a dataset specifically aggregated for an international classification challenge by Traut et al. [52] that included healthy controls and participants with autism spectrum disorder (ASD).Imaging data was preprocessed by the classification challenge authors [52,126] using the standard pipeline from the '1000 Functional Connectomes Project', which includes motion correction, skull stripping, spatial smoothing, segmentation, and noise signal regression [126].We opted to use the preprocessed time series corresponding to the Harvard-Oxford cortical atlas, which is published with the FSL software library [161] and includes 48 regions, noting that homotopic region pairs were consolidated into singular bilateral regions.

Quality control
After preprocessing, we identified six participants from the UCLA CNP dataset (IDs: sub-10227, sub-10235, sub-10460, sub-50004, sub-50077, and sub-70020) for whom the ICA-AROMA + 2P + GMR pipeline yielded constant zero values across all timepoints and brain regions.These six participants (N = 3 Control, N = 2 SCZ, N = 1 ADHD) were excluded from all further analyses.Head movement in the scanner was evaluated using framewise displacement (FD) calculated from the six framewise head-motion parameters (corresponding to rotation and translation in the x, y, and z planes) using the method of Power et al. [162].We found that SCZ, BP, and ASD-but not ADHD-cases exhibited significantly higher mean FD relative to control participants, consistent with prior studies [163][164][165] (all p < 0.05, Wilcoxon rank-sum test).Head movement (mean FD) alone significantly distinguished cases from controls for SCZ (74.0 ± 12.5%), BP (60.7 ± 11.4%), and ASD (55.5 ± 4.3%), though not for ADHD (44.0 ± 11.5%; Fig. S6B).In order to mitigate the potential confounding effect of head motion on the dynamical BOLD properties with putative neural causes, we applied the movement-based exclusion criteria described in Parkes et al. [157] (labeled as 'lenient' criterion in that study), such that any participant with mean FD > 0.

Participant summary statistics
After performing quality control, we retained between N = 39 and N = 578 participants per clinical group across the two studies, with summary statistics provided in Table 1.There were significant differences in age between controls and SCZ (Wilcoxon rank sum test, p = 7 × 10 −4 ) and BP (Wilcoxon rank sum test, p = 0.01), but not between controls and ADHD or ASD.There were also significant differences in the distribution of males versus females between UCLA CNP controls and SCZ (X 2 (1, N = 164) = 5.7, p = 0.01) and between ABIDE controls and ASD (X 2 (1, N = 1091) = 20.6,p = 5 × 10 −6 ), though not for BP or ADHD.Participants in the UCLA CNP dataset were scanned at two different sites [38] and participants in the ABIDE dataset were scanned across seventeen sites [126].
The ABIDE dataset is approximately ten-fold larger than any of the comparisons from the UCLA CNP dataset, and some studies have shown that classification performance with this dataset is inversely proportional to the number of participants included in the model [66][67][68].As we did not include site information as a confounding variable in our classifiers, nor did we attempt to perform cross-site alignment, we expect that our classification results for the ABIDE dataset are lower than could be obtained using a singular site or by applying cross-site harmonization.For a given brain region, the BOLD time series can be summarized using a set of univariate time-series features.
We opted to use the catch22 library of univariate properties [53], which are listed in Table 2 and described in more detail in Supplementary Table S1.The catch22 features were drawn from the broader hctsa [15,16] library, designed to be a highly explanatory subset of time-series features for generic time-series data mining problems [153] (though not defined specifically for neuroimaging datasets).To compare the performance of these features with a standard biomarker for quantifying localized resting-state BOLD activity, we also computed the fractional amplitude of low-frequency fluctuations (fALFF), which measures the ratio of spectral power in the 0.01-0.08Hz range to that of the full frequency range [56].fALFF is considered to be an index of spontaneous activity in individual brain regions and is fairly robust to the noise, scanner differences, and low sampling rates typical of resting-state fMRI data [166][167][168].The catch22 features were computed in R using the theft package (version 0.4.2;[169]) using the calculate_features function with the arguments feature_set = "catch22", which z-scores each time series prior to feature calculation.We also calculated the mean and standard deviation (SD) from the raw BOLD time series by including the catch24 = TRUE argument, which yielded a total of 24 univariate time-series features.We computed fALFF in Matlab as per the code implementation included in Fallon et al. [20].Collectively-the catch22 features, mean, SD, and fALFF-comprise 25 univariate features that encapsulate a diverse set of statistical time-series properties with which to compare across different clinical exemplars.These 25 time-series features were computed across all brain regions and participants, yielding a three-dimensional array in the form of N × R × F -comprised of N participants, R brain regions, and F time-series features.

Pairwise features
To summarize different types of pairwise coupling, or functional connectivity (FC), across pairs of brain regions, we used a subset of 14 SPIs from the pyspi library of pairwise interaction statistics [17].Each of these SPIs is Table 2.The 25 univariate time-series features computed for each brain region to capture intra-regional dynamics.For each univariate time-series feature, its shorthand name (referenced throughout the manuscript and in figures) as well as its name in 'catch22' [53] code implementation is listed.More information about each feature beyond its code implementation name is provided in Table S1.3. The 14 statistics for pairwise interactions (SPIs) computed across brain region pairs.For each SPI, its shorthand name (referenced throughout the manuscript and in figures) as well as its name in 'pyspi' code implementation is listed.More information about each SPI beyond its code implementation name is provided in Table S2.listed along with its code name in 'pyspi' in Table 3; more detailed descriptions are provided in Supplementary Table S2.These 14 SPIs contain a single representative from each of the 14 data-driven modules of similarly performing SPIs described in Cliff et al. [17].It includes the most commonly used FC metric for fMRI time series, Pearson correlation coefficient (denoted as 'cov_EmpiricalCovariance' in pyspi), enabling direct comparison of its performance to the other thirteen evaluated SPIs, which span nonlinear coupling (e.g., additive noise modeling), frequency-based coupling (e.g., power envelope correlation), asynchronous coupling (e.g., coherence magnitude), time-lagged coupling (e.g., dynamic time warping), and directed coupling (e.g., directed information).The 14 SPIs were computed from the BOLD time series for each pair of brain regions using the pyspi package (version 0.4.0)

Shorthand name pyspi code implementation
in Python [170].Given the large number of MTS matrices to process (across all participants), we distributed pyspi computations on a high-performance computing cluster.Each SPI yielded a matrix of pairwise interactions between all brain regions, with a total of 6642 possible pairs for UCLA CNP (from R = 82 brain regions) and 2256 pairs for ABIDE (from R = 48 brain regions), after omitting self-connections.As some SPIs (like the Pearson correlation) are undirected, the resulting FC matrix is symmetric, and only the set of coupling values for all unique brain-region pairs (corresponding to the upper half of the FC matrix) is retained for these SPIs.Across all SPIs, pyspi computations yielded a three-dimensional array in the form of N × P × S, comprised of N participants, P region pairs (noting that this value will differ for directed versus undirected SPIs), and S SPIs.

Case-control classification
We sought to comprehensively compare the ability of diverse intra-regional time-series features and inter-regional coupling strengths to separate healthy controls from individuals with a given clinical diagnosis.Univariate, intra-regional time-series features are sensitive to local, region-specific disruptions in neural activity, whereas inter-regional SPIs are sensitive to disrupted coupling between pairs of brain regions.We first asked whether case versus control separation would be better with: (i) multiple dynamical properties measured from a single brain region; (ii) a single time-series property measured across brain regions; or (iii) the combination (the full set of time-series properties across all brain regions).We then asked how pairwise FC measurements could capture case-control differences, and whether including our set of diverse univariate features would enhance SPI-wise classification performance.To compare the ability of univariate and/or pairwise features to separate case versus control individuals, we evaluated five different classification pipelines as enumerated in the following subsections, labeling each analysis as 'A X ' where X captures a short description of the analysis.These are listed as follows: 1.A region : The performance of individual brain regions given all F univariate time-series features, assessed via classifiers fit to N × F (participant × feature) matrices as depicted in Fig. 1D(i).R = 82 models are fit with this representation for SCZ, BP, and ADHD and 48 models for ASD.

Classifier fitting and evaluation
Each of the analyses enumerated in the previous sections (e.g., A region ) refers to a way in which we extracted a set of interpretable time-series features from an fMRI dataset to capture different types of dynamical statistics.We sought to compare how these feature-based representations could distinguish cases from controls in each neuropsychiatric disorder using a simple measure of classification performance.We evaluated classification using linear SVMs [171], which comprise a simple classical machine learning model that can handle large input feature spaces using regularization [172].As a robustness analysis, we subsequently evaluated classification using SVMs with an RBF kernel and random forest ensemble classifiers, finding that these two nonlinear classifiers did not substantially improve performance beyond the linear SVM kernel for A region , A feature , or A uni_combo across disorders (see Fig. S1).
Since the UCLA CNP cohort is characterized by class imbalances (such that there are more control participants than those with a given neuropsychiatric disorder), we applied inverse probability weighting such that samples from the minority class are assigned a larger weight when identifying the decision boundary [173].Additionally, we evaluated classifier performance using balanced accuracy [174], which measures the average between classifier sensitivity and specificity to reflect accuracy in both the majority and minority classes.Classifiers were evaluated using stratified 10-fold cross-validation (CV) such that the hyperplane was defined based on the training data and used to predict the diagnostic labels of the unseen test data for each fold.In order to reduce outlier-driven skews in time-series feature distributions, we normalized each training fold using the scaled outlier-robust sigmoidal transformation described in Fulcher and Jones [16] and applied the same normalization to each test fold to prevent data leakage [175].We confirmed that this reduced more distributional skew of many univariate and bivariate time-series features compared to z-score normalization as shown in Fig. S7.To ensure that results were not driven by any one specific CV split, each 10-fold CV procedure was repeated 10 times [176][177][178], yielding 100 out-of-sample balanced accuracy metrics per model.We report the mean ± SD tabulated across the 100 out-of-sample balanced accuracy metrics per model.
SVM classifiers were implemented in Python with the SVC function from scikit-learn [179]) with a default value of 1 for the regularization parameter, C and with the class_weight = 'balanced' argument for inverse probability weighting.The scaled outlier-robust sigmoidal transformation was applied using a custom Python script adapted from the Matlab-based mixedSigmoid normalization function from hctsa [15,16].We implemented 10 repeats of 10-fold CV using the StratifiedKFold function from scikit-learn as part of a 'pipeline', which also automatically applied normalization based on training fold parameters and evaluate balanced accuracy for each test fold.For each of the 10 repeats, we set the same random number generator state for StratifiedKFold such that all compared models were evaluated on the same resampled sets of test folds, which is depicted in fold-wise heatmaps in Fig. S8.
We also computed the area under the receiving operator characteristic curve (AUC) for each test fold and found that this yielded comparable results to the balanced accuracy in each case, so we only report balanced accuracy in this work [173].

Dimensionality reduction and feature selection
While A region and A feature both entail models with more samples than features, the other three representations-A uni_combo , A FC , and A FC_combo -involve models with more features than samples, which risks overfitting to accumulated noise [180].To test for such overfitting, we first evaluated two distinct methods for reducing the SVM feature space, focusing on analysis A uni_combo as a starting point: 1. Principal components analysis (PCA) for dimensionality reduction; and 2. L1 (LASSO) regularization for feature selection.
First, for each case-control comparison, we applied PCA to the region × feature matrices across all cases and the corresponding control cohort using the PCA function from the FactoMineR package in R (version 2.9) [181].We retained the first 25 principal components (PCs) to match the number of dimensions supplied to the linear SVMs for each brain region-i.e., the smallest classifier input space in our main analysis.Each case-control comparison was evaluated using these 25 PCs with the same classification procedure as described above.To complement this dimensionality reduction analysis, we also evaluated L1 (LASSO) regularization [182] which applies a penalty term that favors shrinking the coefficients for input features to zero.The L1 optimization algorithm aims to balance between minimizing the sum of the absolute values of the feature coefficients while also minimizing the loss (i.e., still fitting the data well).We implemented this regularization using the LinearSVC function from 'scikit-learn' (instead of the SVC function used for other analyses), keeping the same regularization parameter value (C = 1) and class weighting ('balanced' for inverse probability weighting).The same repeated 10-fold cross-validation procedure was applied as above.

Statistical significance
To make statistical inferences about how a given classifier performed relative to chance level, we calculated the probability of observing a given mean balanced accuracy relative to a null distribution.For each model, we fit 1000 SVM classifiers with randomly shuffled diagnostic class labels using the same 10-repeat 10-fold CV as described above-yielding 1000 null balanced accuracy estimates per model.After confirming that all evaluated models yielded approximately normal null distributions centered at 50% (i.e., chance) balanced accuracy (Fig. S9), we derived the mean and standard deviation from each null distribution.These estimated parameters were used to compare the observed balanced accuracy with the cumulative distribution function using the pnorm function in R to obtain a p-value, capturing the probability of obtaining a null balanced accuracy greater than or equal to the real observed balanced accuracy.To correct for multiple comparisons within each disorder, we applied Holm-Bonferroni correction [183] to control the family-wise error rate at the α = 0.05 level.All Holm-Bonferroni corrected p-values are denoted as p corr and the number of comparisons are indicated as appropriate.

Volumetric analysis of group differences
In order to test whether the balanced accuracy of a given brain region was related to gray matter volume differences in case-control comparisons, we compared the number of voxels in each brain region per participant in the UCLA CNP cohort (as volumetric data is not available for the ABIDE ASD cohort).We used the fmriprep anatomical derivative 'aparc+aseg' parcellation and segmentation volumes that were mapped to MNI152 space using nonlinear alignment and tabulated the number of voxels in each region.Of note, these segmentation volumes correspond to the Desikan-Killiany-Tourville atlas [184] in which voxels for three brain regions (frontal pole, temporal pole, and banks of the superior temporal sulcus) were redistributed into other regions-thus precluding volumetric analyses for those three regions.For each brain region, we regressed mean volume (measured in the number of voxels) on diagnosis for each case-control comparison with an ordinary least squares model, with the resulting β coefficients listed in Supplementary Table S9.We examined the magnitude of the β coefficient estimated for each brain region to evaluate whether volumetric differences relative to controls were related to that region's balanced accuracy.The Pearson correlation coefficient was computed to measure the linear association between β coefficient magnitude and classification performance for all 82 brain regions per disorder.Additionally, to evaluate whether the overall volume of a region was related to how well its resting-state activity distinguished cases from controls, we compared the mean number of voxels per region across participants per disorder with the disorder-wise mean balanced accuracy per region using the Pearson correlation coefficient.

Corrected t-statistics
To quantify the change in classifier performance for pairwise FC matrices with versus without the inclusion of univariate region × feature data (from A uni_combo ), we compared the balanced accuracy distributions across all 100 test folds for the corresponding two classifier inputs per SPI.The standard t-test for group means is suboptimal in this case, as the participant overlap across test folds in the two compared models violates the assumption of independent samples [185].We instead implemented a corrected test statistic T corr designed specifically for repeated k-fold cross-validation, as defined in [186] based on the original corrected T-statistic described in Nadeau and Bengio [187].This correction was applied using the repkfold_ttest function from 'correctR' package in R (version 0.1.3)[74].

Code and data availability
All data used in this study is openly accessible, with preprocessed data provided in Zenodo [188].UCLA CNP resting-state fMRI imaging data can be downloaded from OpenNeuro with accession number ds000030 (https://openfmri.org/dataset/ds000030/). ABIDE resting-state fMRI regional time series can be downloaded from Zenodo [52].All code used to compute univariate and pairwise time-series features, perform classification tasks, and visualize results is provided on GitHub and Zenodo [51].

3 .
A uni_combo [Fig.1D(iii)] represents each fMRI time series using a set of features combining the 25 time-series properties of each individual brain region.If this representation out-performs A region and/or A feature individually, that would suggest changes to different properties across different regions that could not be fully captured by either reduced representation.

Figure 1 .
Figure 1.In this study, we systematically compare different ways of quantifying dynamical patterns from resting-state fMRI time series, focusing on statistics of local regional dynamics and pairwise coupling across four neuropsychiatric disorders.A For a given resting-state fMRI volume (i), the cortex and subcortex are parcellated into individual regions from which the voxel-averaged BOLD signal time series is extracted (ii).Two key ways of quantifying dynamical patterns from these data are to: (iii) measure properties of the dynamics of individual brain regions (shown green); or (iv) to compute statistical dependencies between pairs of regions (pink and blue).B To evaluate the performance of different types of dynamical representations of an fMRI time-series dataset (for identifying disease-relevant changes in neural activity), we included four neuropsychiatric exemplars originating from two open-access datasets: the UCLA CNP 131 coupling from a given SPI (as A FC ) as well as 25 time-series properties of all individual brain regions 132 Bryant et al. | Extracting interpretable signatures of whole-brain dynamics | 4

Figure 2 .
Figure 2. Local properties of BOLD dynamics in individual brain regions can significantly distinguish cases versus controls.AWe consider the BOLD signal from an individual brain region (e.g., the caudal middle frontal gyrus shown in green) in which our set of 25 time-series features were computed.For each brain region, the values from all time-series features were organized into a feature × participant matrix, forming the basis for our case-control classification analysis at each individual region.B The balanced accuracy is shown for all brain regions as a raincloud distribution, where each point corresponds to the classification performance of a single region.The boxplot within each violin plot depicts the median and interquartile range for the region-wise classification performance within each neuropsychiatric disorder.C To visualize the spatial distribution of regional classification performance, mean case-control classification accuracies are depicted in brain maps for each disorder.Only regions that yielded a statistically significant balanced accuracy (pcorr < 0.05, corrected across 82 brain regions for SCZ, BP, and ADHD and 48 brain regions for ASD) are shaded, and the total number of significant brain regions per disorder is indicated above each brain map.Note that the Desikan-Killiany cortical and subcortical atlas is shown for SCZ (i), BPD (ii), and ADHD (iii) while the Harvard-Oxford cortical atlas is shown for ASD (iv).

Fig. 4B shows
Fig.4Bshows the mean balanced accuracy using this comprehensive representation of local dynamical properties across all brain regions as singular points in the middle row (A uni_combo ) for each disorder.We found that this representation distinguished cases from controls well for SCZ[67.7 ± 11.2%, permutation test p = 4 × 10 −39 , Fig. 4B(i)] and BP [68.4 ± 11.4%, p = 2 × 10 −44 , Fig. 4B(ii)], as indicated by the points located far from chance performance (shown as dashed lines at 50%).The strong performance in SCZ and BP was consistent with the large number of regions and features that yielded significant classification balanced accuracies on their own (cf.Figs 2B-C, 3B).The combination classifier also significantly distinguished ASD cases from controls (balanced accuracy 53.6 ± 4.7% [p = 1 × 10 −10 , Fig. 4B(iv)], although the low performance is likely attributable to the heterogeneity of the large multi-site sample [66-68].By contrast, the combined region × feature classifier did not distinguish ADHD patients from controls better than the null model [cf.Sec.4.3.3;49.8 ± 8.7%, p = 5 × 10 −1 , Fig. 4B(iii)].Combination classifier performance is summarized in Supplementary TableS5(for which the balanced Fig. 4B, to compare how well individual brain regions and time-series features compared with this combination classifier, we also depict the balanced accuracy for each individual brain region (top row, A region ) and each individual time-series feature (bottom row, A feature ) as a series of points.Figures 4B(i)-(ii) show that the combination of multiple intra-regional time-series Bryant et al. | Extracting interpretable signatures of whole-brain dynamics |

Figure 4 .
Figure 4. Representing a multivariate fMRI time series with whole-brain maps of univariate features improves classification performance for BP and SCZ, but not for ADHD or ASD-for which simpler representations perform better.A Starting with a multivariate resting-state BOLD time series, all F time-series features (in this case, 25 features)were computed across all R brain regions.All time-series feature values for all R brain regions were concatenated into one wide matrix, upon which the case-control classifier is evaluated for each disorder.B To compare the performance of this combination classifier with that of individual regions and time-series features, the mean balanced accuracy is shown for all brain regions (upper row, labeled A region ), the combination classifier (middle row, labeled A uni_combo ), and time-series features (bottom row, labeled A feature ).Lines are included as a visual guide to aid comparison.Colored dots indicate a classification model with significant balanced accuracy (pcorr < 0.05); gray dots indicate non-significant models.C To more directly compare performance across A region , A feature , and A uni_combo representations of whole-brain univariate dynamical properties (which each involve fitting different numbers of models), we implemented a strategy that either selects the top-performing brain region (A region ) or the top-performing time-series feature (A feature ) in each training fold, and evaluated the distribution of balanced accuracy across test folds, shown as violin plots here.For each disorder, the brain region and time-series feature that were most frequently selected based on in-sample performance are annotated, along with the number of test folds out of 100 for which it was selected.Each vertical line indicates the mean out-of-sample balanced accuracy across the 100 train-test splits.

Figure 5 .
Figure 5. Representing brain activity as the set of all pairwise functional connectivity strengths, A FC , can significantly distinguish cases from controls, with the classical Pearson correlation coefficient (capturing linear contemporaneous coupling) a top performing metric.A We compared 14 statistics of pairwise interactions (SPIs) (from pyspi [17], cf.Sec.4.2.2), as different ways of quantifying functional connectivity (FC) between pairs of brain regions.For a given SPI, each participant was represented by the set of corresponding FC values (calculated for each pair of brain regions), yielding a set of region-region pair values that can be stored as a one-dimensional vector per participant.These vectors were concatenated to yield a participant × region-pair matrix that formed the basis for case-control classification using a linear SVM.B Classification results are shown as a heatmap, with rows representing SPIs that yielded significant balanced accuracy (pcorr < 0.05, corrected across 14 SPIs) in at least one disorder, and columns representing each of the four disorders.Of the 14 SPIs we evaluated, eleven significantly distinguished cases from controls in at least one disorder, and are plotted here.The Pearson correlation coefficient is annotated in boldface for easier reference.C The SPI similarity score, |ρ SPI |, is visualized between each pair of the eleven SPIs from B as a heatmap, revealing six clusters of SPIs with similar behavior on the dataset (based on their outputs across all region-pairs and all disorders).As in B, the Pearson correlation coefficient annotation is shown boldface.D The disorder similarity score, ρ disorder , is depicted to compare the balanced accuracy values among all 14 SPIs between each pair of neuropsychiatric disorders; a large positive ρ disorder indicates a strong positive Spearman correlation in case-control classification performance across the 14 SPIs in the given pair of disorders.

Figure 6 .
Figure 6.Adding statistics of local regional dynamics enhances classification performance for many pairwise FC metrics.A Here we evaluate the effect of including brain-wide maps of local regional dynamics with pairwise coupling data for each SPI on case-control classification performance.By concatenating the full univariate region × feature matrix from A uni_combo with the full set of brain region pairs per SPI from A FC , this yielded a set of 14 linear SVM classifiers.BThe mean balanced accuracy is shown in heat map tiles for each SPI by row, organized into columns by neuropsychiatric disorder.ADHD is not included as none of the A uni_combo models yielded a significant balanced accuracy.C The mean balanced accuracy is shown for each SPI on its own (left, A FC ) and with the inclusion of univariate region × feature data (right, A FC_combo ).Points are colored to indicate whether the corresponding balanced accuracy was significant (green, pcorr < 0.05) or not significant (gray) relative to the corresponding null distribution (cf.Sec.4.3.3).Each line maps to one SPI to visually guide comparison across representation types.Lines are shaded darker to indicate a significant difference in SPI performance with the inclusion of localized dynamics (corrected t-test pcorr < 0.05, corrected across 14 SPIs; cf.Methods, Sec.4.3.5).

2 .
A feature : The performance of individual univariate time-series features measured from all brain regions, assessed via classifiers fit to N × R (participant × region) matrices as depicted in Fig.1D(ii).F = 25 such models are fit for all four disorders.3.A uni_combo :The performance of all F univariate time-series features computed across all R brain regions, assessed via classifiers fit to N × RF (participant × brain-region-feature) matrices as depicted in Fig.1D(iii).Only one such model is fit per disorder.4. A FC : The performance of the set of coupling strengths measured from all brain region pairs using an individual SPI, assessed via classifiers fit to N ×P (participant × region-pair) matrices, as depicted in Fig. 1D(iv).S = 14 such models are fit for each disorder.5.A FC_combo : The performance of an individual SPI measured from all brain region pairs with the addition of all F univariate time-series features, assessed via classifiers fit to N × P RF (participant × region-pair-region-univariate-feature) matrices as depicted in Fig. 1D(v).S = 14 such models are fit for each disorder.Bryant et al. | Extracting interpretable signatures of whole-brain dynamics | 23

Figure S5 .Figure S6 .
Figure S5.Linear dimensionality reduction and regularization approaches did not improve out-of-sample classification for the univariate region × feature classifiers, A uni_combo .A For each disorder, individual scores for the first two PCs are plotted, with points colored according to diagnosis.Shaded areas reflect convex hulls encapsulating all points for each diagnostic group.Note that each PCA was computed separately for each case-control comparison, so PC1 and PC2 scores are not directly comparable across clinical groups.B For each case-control comparison, we compare the out-of-sample balanced accuracy across the 100 repeats × folds using all region × feature variables (left, dark purple) versus using only scores for the first 25 PCs (right, light purple).Points are randomly jittered along the horizontal axis to aid visualization.Box plots are included in each half-violin to indicate the median and interquartile range of the distribution.C For each case-control comparison, we compare the out-of-sample balanced accuracy across the 100 repeats×folds using default regularization (left, dark green) versus L1 regularization (right, light green).

Figure S7 .
Figure S7.Comparing normalization methods supports the use of the outlier-robust mixed sigmoid method.Whole-brain univariate time-series feature value distributions are shown for all participants in the UCLA CNP cohort (A) or ABIDE cohort (B) with no normalization (upper row), z-score normalization (middle row), and outlier-robust mixed sigmoid normalization (bottom row; see Methods Sec.4.3.1 for description).Whole-brain pairwise SPI value distributions are shown for all participants in the UCLA CNP cohort (C) or ABIDE cohort (D) with no normalization (upper row), z-score normalization (middle row), and outlier-robust mixed sigmoid normalization (bottom row).

Table S8
).Of particular interest is the significant improvement in Φ * classification performance in BP, increasing from 54.1 ± 8.0% on its own, up to 67.3 ± 11.7% with the inclusion of Bryant et al. | Extracting interpretable signatures of whole-brain dynamics | 14 SCZ and BP were more accurately distinguished from control participants than ADHD or ASD All 14 SPIs we evaluated in A FC_combo (see Results Sec. 2.5) distinguished SCZ, BP, and ASD cases from controls Bryant et al. | Extracting interpretable signatures of whole-brain dynamics | 17 3.3 [136]dological limitations and considerations for future work such as graphical lasso[135].The linear SVM effectively employed a Euclidean distance space, although Venkatesh et al.[136]note that (non-Euclidean) geodesic distance metrics are more suitable for comparing FC matrices across individuals.While we did not find performance improvements using SVMs with an RBF kernel or random forest classifiers (see Methods Sec.4.3.1 for more details), it is possible that combining other nonlinear classifiers with hyperparameter optimization could more sensitively discriminate among features in a given disorder if nonlinear boundaries are present, warranting further study.
Bryant et al. | Extracting interpretable signatures of whole-brain dynamics | 183.4

Table 1 .
Key demographics for participants included in this study after quality control exclusion processes.