A Deep Graph Neural Network Architecture for Modelling Spatio-temporal Dynamics in resting-state functional MRI Data

Resting-state functional magnetic resonance imaging (rs-fMRI) has been successfully employed to understand the organisation of the human brain. For rs-fMRI analysis, the brain is typically parcellated into regions of interest (ROIs) and modelled as a graph where each ROI is a node and pairwise correlation between ROI blood-oxygen-level-dependent (BOLD) time series are edges. Recently, graph neural networks (GNNs) have seen a surge in popularity due to their successes in modelling unstructured relational data. The latest developments with GNNs, however, have not yet been fully exploited for the analysis of rs-fMRI data, particularly with regards to its spatio-temporal dynamics. Herein we present a novel deep neural network architecture, combining both GNNs and temporal convolutional networks (TCNs), which is able to learn from the spatial and temporal components of rs-fMRI data in an end-to-end fashion. In particular, this corresponds to intra-feature learning (i.e., learning temporal dynamics with TCNs) as well as inter-feature learning (i.e., leveraging spatial interactions between ROIs with GNNs). We evaluate our model with an ablation study using 35,159 samples from the UK Biobank rs-fMRI database. We also demonstrate explainability features of our architecture which map to realistic neurobiological insights. We hope our model could lay the groundwork for future deep learning architectures focused on leveraging the inherently and inextricably spatio-temporal nature of rs-fMRI data.

data and brain signal in particular interact nonlinearly [7,8]. 27 To overcome such limitations, a different approach to the analysis of rs-28 fMRI data would be to devise a model that is able to combine both feature 29 engineering and the learning of a low-dimensional representation of the brain. 30 In order to do this, such a model would need to be able to accommodate both for spatial inter-relationships between brain regions, and temporal convolutional networks (TCNs) to capture the intra-temporal dynamics of bloodoxygenated-level dependent (BOLD) time series. By incorporating GNNs 67 and CNNs in the same end-to-end architecture we essentially combine intra-68 and inter-feature learning. In particular, GNNs can tackle a limitation of 69 some graph representations of rs-fMRI data, in which association measures 70 between different regions of interest (ROIs) of the brain are based on linear 71 models; instead, GNNs can capture higher-order interactions between ROIs.

72
A very preliminary version of this work with a 30-fold smaller dataset was 73 recently presented as a conference contribution [21]. However, further work 74 was needed regarding a larger dataset, wider choices of the graph threshold 75 hyperparameter, and analysis on inclusion of edge weights. 76 We test our architecture on the publicly available UK Biobank dataset, 77 which at the time of writing provides rs-fMRI scans from more than 30,000 78 distinct people. This dataset offers a unique opportunity to formulate novel 79 architectures, while supporting the need of large datasets for reproducible 80 findings with minimal statistical errors [22]. We also conducted an ablation 81 analysis on a proof-of-concept binary sex prediction task to better evaluate TCNs are based on dilated causal convolutions [25], which are special 1D filters where the size of the receptive field exponentially increases over the temporal dimension of the data as the depth of the network increases. The padding of the convolution is 'causal' in the sense that an output at a specific time step is convolved only with elements from earlier time steps from the previous layers, thus preserving temporal order. More formally, given a single ROI timeseries x i ∈ R T and a filter f ∈ R K , the dilated causal convolution operation of x with f at time t is represented as (2) Note that for rs-fMRI graph representations, each edge originally con-122 tains a single value (i.e., e k ∈ R), but after this operation φ e , the resulting 123 dimensionality can be different: where Finally, the updated node features are computed using another aggrega-132 tion function at the node level, for each node i: The DiffPool operator, at layer l, thus receives both an adjacency matrix 161 and a node embedding matrix, and computes updated versions of both: To achieve this, the DiffPool operator uses a graph neural network (GNN) where N (l) is the number of nodes in layer l, N (l+1) the new number of 167 nodes, each corresponding to a cluster (N (l+1) < N (l) ), and F the number of 168 features per node, which can be different from the original size F from the 169 matrix X.

170
The operator ends with the creation of the new node embedding matrix 171 and adjacency matrix, to be inputted to the next layer: where X (l+1) ∈ R N (l+1) ×F and A (l+1) ∈ R N (l+1) ×N (l+1) .   In order to assess the validity of our model, we performed a proof-of-254 concept through the well-known binary sex prediction task [39,40]. We      Table 1 shows the results of our ablation analysis across three differ-  Table 1: Ablation analysis, with metrics averaged across the five test sets, with standard deviation in parenthesis. Aggregator on the right-hand side of the arrow, "N" corresponds to only node model, and "N + E" corresponds to full Graph Network block. Params stands for number of parameters.  Table 2.

Model AUC Accuracy Sensitivity Specificity Params
336 Table 2: Results with no thresholded graphs, with metrics averaged across the five test sets, with standard deviation in parenthesis. Aggregator on the right-hand side of the arrow, "N" corresponds to only node model, and "N + E" corresponds to full Graph Network block. Params stands for number of parameters. The performance was lower for all cases which did not implement a thresh-  For example, Figure 4 shows the weights learnt from the first TCN layer We further designed a strategy to inspect the hierarchical spatial pooling architecture has considered optimal while learning a particular prediction 385 task. These aggregations can therefore be considered "optimal" for that task For visualisation purposes in a more traditional brain connectivity style, we 405 selected the four main clusters defined by the dendrograms for the N + 406 E → DiffPool model and overlaid these clusters on a brain surface in Figure 7.

Model
407 Figure 6: Upper-triangle of the association matrix S for N + E → DiffPool model generated when predicting binary sex on unthresholded matrices, with dendrograms from hierarchical clustering. Each element S i,j indicates how many times brain regions i and j are pooled together. On the lower left corner, a graph representation of the same association matrix S , thresholded at 50% with nodes identified and coloured according to their general brain region (i.e., T/F/O/P/I correspond to Temporal, Frontal, Occipital, Parietal, and Insula); thicker edges represent a higher S i,j value, in this graph representation ranging from 20, 256 to 34, 565.

431
When looking at how the GNNs clustered the brain regions to optimise 432 and achieve best sex prediction, we found that a clustering into 4 sets of 433 brain regions showed interesting properties in terms of neurobiological inter-434 pretability. More specifically, the brain regions were grouped in a manner 435 that mirrors to a certain degree the well-known cytoarchitectural and func- model has performed to achieve optimal sex classification is fully relevant to 458 mechanistically describe the sex differences that are seen at the behavioural 459 level.

460
These results demonstrate the explainability capacity of our model when 461 using the DiffPool aggregator, which is able to cluster brain regions in a 462 specific way for the classification task at hand.

Conclusion
In this paper we presented a novel deep learning architecture which can The code used to conduct the analysis described in this paper is publicly 500 available at https://github.com/tjiagoM/spatio-temporal-brain.