Subtypes of functional connectivity associate robustly with ASD diagnosis

Our understanding of the changes in functional brain organization in autism is hampered by the extensive heterogeneity that characterizes this neurodevelopmental disorder. Data driven clustering offers a straightforward way to decompose this heterogeneity into subtypes of distinguishable connectivity types and promises an unbiased framework to investigate behavioural symptoms and causative genetic factors. Yet the robustness and generalizability of these imaging subtypes is unknown. Here, we show that unsupervised functional connectivity subtypes are moderately associated with the clinical diagnosis of autism, and that these associations generalize to independent replication data. We found that subtypes identified robust patterns of functional connectivity, but that a discrete assignment of individuals to these subtypes was not supported by the data. Our results support the use of data driven subtyping as a data dimensionality reduction technique, rather than to establish clinical categories.

. Robustness of subtyping outcomes across brain networks. Left: Stability of the FC subtype maps. Boxplots represent the range of the average similarity between FC subtype maps of the same brain network that were extracted from separate subsamples of the discovery dataset. Middle: Stability of discrete assignments of individuals to a FC subtype cluster. Boxplots represent the average overlap between the clusters an individual was assigned to in two different random subsamples. Right: Stability of continuous assignments of individuals to a FC subtype across repeated imaging sessions. Bar plots represent the average Intraclass Correlation between continuous subtype assignments computed on separate longitudinal imaging sessions. Different bar hues represent the stability of continuous subtype assignments extracted from out-of-sample subtypes (black), from within-sample subtypes (dark blue), within-sample subtypes in a general population data set where multiple scan sessions were combined to compute continuous subtype assignments (lighter shades of blue reflect more combined sessions).

88
Subtype maps are stable 89 We first aimed at evaluating the robustness of subtype maps. Subtype maps are the spatial FC 90 profiles corresponding to each identified subtype in the brain. For this purpose, we repeated the 91 subtype analysis on random subsamples of 50% of the discovery dataset. We then matched the 92 subtype maps of each seed network across subsamples, based on the highest similarity between 93 pairs of maps. The average spatial Pearson correlation between matched subtype maps was̄ = 94 0.65 (0.034 ) across all seed networks and subsamples. We observed small variations across seed 95 networks: from = 0.58 (0.081 ) for the inferior temporal gyrus seed network up to = 0.7 96 (0.069 ) for the dorsal motor network (see Figure 1). We thus showed that the subtype maps of 97 the identified subtypes were robust to random perturbations in the dataset.

98
Discrete individual subtype assignments are not stable 99 To evaluate the robustness of discrete assignments of individuals to a subtype, we compared the   Subtypes are robust to nuisance covariates and parameter changes 139 We then conducted the FC subtype analysis in the discovery dataset for each seed network. FC 140 subtypes were identified according to two criteria: an average spatial dissimilarity below 1, and a 141 minimum number of 20 individuals within each subtype. Across all seed networks, we identified 87 142 FC subtypes, with an average of 5 per network. These FC subtypes captured on average 97% of in-143 dividuals in the sample (see also Appendix 1). We tested whether continuous subtype assignments 144 were driven by head motion, age, or recording site and found no significant linear associations with 145 these covariates (see also Appendix 1). Lastly, we evaluated whether our results were influenced 146 by the choice of the dissimilarity threshold by repeating the subtyping and subsequent analysis 147 steps for different levels of dissimilarity thresholds. We found that our results were robust across 148 dissimilarity thresholds but that higher thresholds led to the inclusion of smaller proportions of 149 the sample (see Fig 1 in  measures of ASD symptom severity (i.e. calibrated ADOS severity scores). 156 We identified 11 FC subtypes for which the continuous assignment of individuals were signifi-individuals showed significantly stronger assignments than ASD individuals with 5 of the 11 sub- showed that a subset of the identified FC subtypes naturally captured some variance of the clinical 172 ASD diagnosis. We did not find an association between continuous subtype assignments and ASD 173 symptom severity beyond the effect of the clinical diagnosis (see Appendix 2). 174 Subtype associations with ASD diagnosis replicate moderately 175 We next investigated how reproducible the discovered association between FC subtypes and ASD 176 diagnosis was in an independent dataset. For each of the subtypes that showed a significant associ-177 ation with ASD diagnosis in the discovery dataset, we computed the continuous assignment for the 178 individuals in the independent replication dataset. In this way, we tested the out of sample repro-179 ducibility of the observed association effect. We tested different degrees of replication: whether 180 the observed effect in the replication sample was significant after correction for multiple compar- two of those were significant at < 0.05 (Figure 2 b). We thus showed that the association be-191 tween subtypes and ASD diagnosis observed on the discovery dataset was moderately replicable 192 in the independent replication dataset.

193
Subtypes with similar risk for ASD show similar spatial patterns of FC alterations 194 We noticed that the spatial pattern of protective subtype maps appeared similar, despite repre-195 senting connectivity profiles from different seed networks (Figure 3 a, b). Similarly, the subtype 196 maps of risk subtypes all appeared to show below average connectivity. We therefore investigated 197 whether subtypes with the same direction of association with ASD diagnosis (i.e. protective and risk 198 subtypes) shared similar FC profiles and whether this also extended to the continuous assignments 199 of individuals to these subtypes. We found that protective subtypes exhibited a highly convergent Error bars reflect the 95% confidence interval of the effect size estimates. The effect size observed in the independent replication data set is shown as a blue dot. b) Matrix showing the degree of replication in the independent replication dataset of the observed association with diagnosis for each of the 11 protective and risk subtypes. Each row corresponds to a bar-plot in a). From top to bottom, the degrees of replication are: FDR: full replication of the effect after FDR correction, < 0.05: replication of the effect for uncorrected statistics, effect within CI: observed effect size in the replication sample falls within the 95% confidence interval of the observed effect in the discovery sample, direction: observed effects in the discovery and independent replication sample go in the same direction. c) Graph illustrating the similarity of continuous subtype assignments across risk and protective subtypes. The average continuous subtype assignments of the top 10% of individuals with the highest similarity with a protective (green shades) or risk (red shades) subtype are displayed across all identified protective (left side) and risk (right side) subtypes. An individual may belong to the top 10% in more than one subtype. d) Correlation plot of the observed effect sizes in the discovery and independent replication datasets. The black line represents the correlation of effect sizes, the grey shaded area reflects the estimated 95% CI of the linear fit.

237
In the wider ASD literature, the robustness of discrete subtype assignments has been more 238 comprehensively investigated for symptom based subtypes. Several symptom based subtypes of 239 autism have been proposed in attempts to provide more homogeneous diagnostic criteria. How-240 ever, the distinction between these subtypes was also not found to be well supported by replica-  243 We may reconcile the seemingly conflicting findings of robust subtypes on the one hand and  subtypes provide a better representation of the data.

264
Subtypes moderately, but reproducibly, associate with ASD diagnosis 265 The majority of previous subtyping analyses in ASD have been constrained to patients that were

305
The comparison with the large case-control study by Holiga and colleagues may also serve to 306 illustrate the conceptual advantage of a subtyping approach over the traditional case-control de-  (Rashid et al., 2018). It is therefore possible that by averaging longer time series 334 for each individual, we get a better approximation of that individual's preferred dynamic state. A 335 promising direction for future research will be the investigation of dynamic FC subtypes in ASD.

336
Datasets providing longer time series per individual will facilitate these inquiries.

337
Our results have focused only on individuals with ASD. Given the extensive evidence of overlap 338 of symptoms (Grzadzinski et al., 2011) and neurobiological phenotypes between ASD and other 339 neurodevelopmental disorders (Sha et al., 2019), a fruitful avenue for future research will be to 340 extend this approach to investigate cross-diagnostic subtypes of FC (Elliott et al., 2018). 341

342
Our findings suggest that unsupervised clustering of heterogeneous imaging data is well suited to  found to pass our quality control criteria. We selected the two imaging sites with the largest num-

430
Quality control of imaging data 431 We controlled the quality of preprocessed data manually and through quantitative cut off values.  thus contained values between 0 (no dissimilarity or a spatial correlation of 1) to 2 (perfect dissim-456 ilarity or a spatial correlation of -1) with 1 denoting no spatial relationship (a spatial correlation of 457 0).

458
For each seed network separately, we characterized communities of individuals with similar 459 seed FC maps by hierarchical agglomerative clustering of the dissimilarity matrix for each seed 460 network using the unweighted average distance linkage criterion (Müllner, 2011). We applied two 461 criteria for the identification of seed FC communities: 1) the average dissimilarity between seed 462 FC maps in a community could not be greater than 1, and 2) the community had to have at least 463 20 members. This allowed for small subsets of individuals with distinct seed FC patterns to not be 464 assigned to any communities. Assigning individuals to subtypes in this way is a discrete process 465 and we therefore refer to these assignments as discrete subtype assignments.

466
Within each seed FC community, we estimated the average seed FC map across all community 467 members. This map reflected the subtype of seed FC shared by the community members and we 468 refer to these maps as the subtype map. to +1 (perfect correlation of the individual and subtype seed FC map).

476
Before we investigated the three aspects of FC subtypes (subtype maps, and discrete and contin-478 uous assignments) in detail, we wanted to determine the robustness of these metrics to perturba-479 tions of the discovery data. We used two approaches: 1) to determine the robustness of discrete 480 subtype assignments and subtype maps, we conducted a stratified subsampling scheme on our dis-481 covery sample, 2) to determine the robustness of continuous subtype assignments, we computed 482 the within subject stability of continuous subtype assignments across repeated scan sessions for 483 individuals in the longitudinal sample. 484 We randomly selected 1000 stratified subsamples of half of our discovery sample while preserv- B. If subtype maps were robustly identified, then we would expect that for each subtype map in 508 sample A we can find at least one subtype map in sample B that is very similar. We therefore 509 searched (with replacement) for each subtype map in sample A the subtype map in sample B with 510 the highest spatial correlation. Since the number of subtypes extracted in each subsample was 511 determined by the data, we allowed for subtype maps in sample B to be a match for multiple 512 subtype maps in sample A. We then took the average of the maximal spatial similarity between 513 subtype maps of sample A and B as a measure of the robustness of the subtype maps. 514 We computed the robustness of the continuous subtype assignments as the intraclass corre-515 lation coefficient between repeated scan sessions of the same individual. We first investigated 516 the robustness of assignments to subtypes that had been identified on data from a separate scan 517 session but of the same sample (within sample robustness subtype assignments between ASD patients and NTC was found in the discovery sample.

550
Within the set of subtypes that showed a significant association with ASD diagnosis we investi- We tested the replicability of the associations between seed FC subtypes and ASD diagnosis in an 562 independent replication sample. Within the replication sample we computed individual seed FC 563 maps for the 18 non-cerebellar MIST_20 seed networks, centered the seed FC maps to the replica-564 tion sample group average and regressed variance of non-interest due to age, head motion and 565 imaging site for each voxel. For the residual seed FC maps, we computed the continuous subtype 566 assignment scores with the subtypes identified in the discovery sample. For those subtypes that 567 showed significant associations with ASD diagnosis in the discovery sample, we then investigated 568 the difference in continuous subtype assignment scores between ASD and NTC individuals in the 569 replication sample.

570
Robustness of findings to changes in the subtyping pipeline 571 Although we did not explicitly specify the number of subtypes to be identified for each seed net-572 work, it was implicitly determined by the maximum dissimilarity parameter and the structure of 573 the subject by subject dissimilarity matrix. In order to understand how robust our findings were to 574 changes in this parameter, we repeated all analysis steps (i.e. the identification of subtypes, the test disorder and autism spectrum disorder. Transl Psychiatry. 2018 Jul; 8(1):133.
Subtypes for each seed network were identified according to two criteria: the average spatial dissimilarity within a subtype was no larger than 1 and at least 20 individuals were part of the subtype. Across all 18 seed networks, we identified 87 FC subtypes in the discovery dataset. In each seed network we identified between 3 (medial visual network) and 6 subtypes (lateral visual network) that satisfied these criteria (the median number of subtypes was 5). On average across networks, 97% of the individuals in the discovery dataset were assigned to a subtype (see also Figure 5). The largest number of individuals not assigned to any subtype was 19 in the inferior temporal gyrus network, and all individuals were assigned to subtypes in the ventral somatomotor and perigenual anterior cingulate seed networks. The average number of individuals in a subtype was̄ = 79.4, (13.2 ). We thus show that the majority of individuals in the discovery dataset contributed to the identified 87 FC subtypes. To ensure that subtypes were not driven by variation of non interest, we tested for linear associations between the continuous assignment of individuals to each subtype, and head motion and age by Pearson correlation. We also tested whether recording sites were overrepresented in subtypes above chance level with a chi-square test. We found no significant linear relationship between continuous assignments of individuals to subtypes and in-scanner head motion, and age for in any seed network. In addition, we found that the distribution of imaging sites across subtypes did not differ significantly from chance. We thus show that subtypes in the discovery dataset were not significantly driven by variance sources of non-interest.
We identified subtypes of FC that satisfied two criteria: a maximal average dissimilarity of the connectivity patterns of individuals contributing to the subtype, and a minimal number of individuals. Although this process did not explicitly specify the number of subtypes to be identified, we sought to understand how robust our findings were to changes in the subtyping criteria. We therefore repeated the complete subtype analysis (i.e. identification of subtypes, association with ASD diagnosis, and generalization on independent data) for different values of maximal within-subtype dissimilarity. This analysis revealed that subtype maps remained highly similar across different values of the dissimilarity criterion with subtypes for the most part contracting and only rarely splitting into subcomponents (see Figure 4). We found highly consistent spatial patterns of protective and risk subtypes and effects of association with clinical ASD diagnosis did generalize to the independent replication dataset at equal rates. We thus conclude that our findings were robust to changes in the parameters of the subtyping analysis.