## Abstract

Recent machine learning techniques have improved the modeling of complex dependencies between brain connectivity and cognitive/behavioral traits, facilitating connectome-based predictions. However, they typically require large datasets. While large open datasets like the Human Connectome Project have offered significant benefits to connectomics research, collecting such large data remains a challenge due to the financial cost and time. To address this issue, we propose Task-guided GAN II, a novel data augmentation method leveraging generative adversarial networks (GANs) to enhance the sample size from limited datasets for connectome-based prediction tasks. Distinguishing from previous approaches, our method incorporates a task-guided branch within the conventional Wasserstein GAN framework, specifically designed to synthesize structural connectivity matrices. It aims to effectively augment data and improve the prediction accuracy of human cognitive traits by capturing more task-directed features within the data. We evaluated the effectiveness of data augmentation using Task-guided GAN II in predicting fluid intelligence utilizing the NIMH Health Research Volunteer Dataset. Our results demonstrate that data augmentation with Task-guided GAN II not only improves prediction accuracy but also ensures that its latent space effectively captures correlations between structural connectivity and cognitive outcomes. Our method would be beneficial in leveraging small datasets for human connectomics research.

## 1. Introduction

Connectomics, which represents neural circuits as networks and analyzes their topology through mathematical approaches, has emerged as an essential methodology in cognitive and computational neuroscience [1] [2]. It facilitates a deeper understanding of the complex relationships among brain networks, cognition, behavior, and individual differences in human cognitive traits and behaviors. Additionally, recent advances in machine learning methodologies have employed connectome-based machine learning models using structural and functional connectivity matrices to predict behavior or cognitive traits [3].

Recent studies have revealed that building prediction models requires more than a hundred samples [4], [5] and has also been shown to be improved in accuracy and robustness as the sample size becomes larger, not depending on the choice of regression models [6]. Therefore, researchers have often used large-sample open datasets such as the Human Connectome Project (HCP) Dataset (*N* = 1200) [7], ABCD Study (*N* = 11878) [8], UK Biobank (*N* = 40000+) [9]. These datasets have contributed significantly to developing analytical methods and understanding human connectomics [10]. However, Yeung et al. have reported that most self-recruited samples included around 100 subjects [5]. In the exploratory stage of the research, it is not easy for individual researchers to collect large samples over 100 due to the limitations on time and cost.

One of the best solutions to resolve the small sample size issue is to employ a data augmentation that generates a new sample by manipulating the given small dataset. In traditional data augmentation, geometric transformation has often been utilized in image data, such as rotation or flipping to the original data. However, such manipulations are inapplicable to the matrix-shaped data of brain functional and structural connectivity because their shape was constrained by the order of brain regions. In recent years, data augmentation methods using generative models have been proposed [11], [12], [13], [14], [15], [16], [17]. In these methods, generative augmentation models approximate the distribution of a given dataset and synthesize new samples with similar characteristics to the original dataset. Specifically, generative adversarial networks (GAN) have been increasingly used to data synthesis in numerous application fields with its strong feature of learning the local and global structure of data [18]. In the neuroscience field, Chao Li et al. proposed the structural connectivity augmentation method named BrainNetGAN to build the model for the classification of Alzheimer’s disease [12]. The accuracy of the graph neural network-based classification model of Alzheimer’s disease was improved using the structural connectivity data synthesized by their BrainNetGAN. Ruizhe Li et al. also suggested a data augmentation method for T1-weighted image named Task-guided GAN (TG GAN) for the brain age prediction model [13]. They improved the accuracy of the age prediction [13]. In their TG GAN model, the task-guided branch of a regression model for brain age prediction was incorporated into the GAN architecture. This accelerated the data synthesis to be more task-specific and contributed to improving the prediction task.

There are a lot of studies on data augmentation methods for medical images, but a few studies on brain connectome data augmentation methods for the connectome-based prediction task. The BrainNetGAN, as mentioned above, was developed for the classification task and trained using specific class labels to synthesize conditioned data. On the other hand, the TG GAN was designed with a 3D image input structure specifically for the T1-weighted image synthesis. Neither model is suitable for synthesizing connectivity matrices aimed at prediction tasks. Given the rapidly increasing interest in connectome-based prediction studies, there is an urgent need to develop a data augmentation model tailored to these tasks. Here, we propose a novel data augmentation method, Task-guided GAN II (TG GAN II). It was designed for synthesizing the structural connectivity matrices and employs a task-guided branch to predict human cognitive traits from the structural connectivity. We aim to enhance the accuracy of predicting cognitive traits from structural connectivity by augmenting the original dataset with the data synthesized by our TG GAN II. To assess the effectiveness of our method, we utilize the NIMH Health Research Volunteer Dataset [19] to build the model predicting fluid intelligence based on structural connectivity. The TG GAN II employs the Wasserstein GAN (WGAN) without task-guided branches as its baseline model. It is evaluated against this baseline in the following three perspectives: 1) the similarity of graph features between synthesized and original data, 2) the improvement in prediction performance, and 3) the influence of the added synthetic data volume on prediction performance enhancement. We hope that these investigations will make a substantial contribution toward addressing the sample size challenges in connectome-based prediction tasks.

## 2. Methods

### 2.1 Baseline model: Wasserstein GAN with gradient penalty (WGAN-GP)

We employed a Wasserstein GAN with gradient penalty (WGAN-GP) [20] as the baseline model for our proposed TG GAN II. The WGAN-GP incorporated the Wasserstein distance measure into its objective function to improve the stability of model training by suppressing the mode collapse. It consisted of the generator that synthesized new structural connectivity matrices, and the critics assessed the synthesized images by estimating the Wasserstein distance between synthesized and given ones. We used an autoencoder-based generator in our WGAN-GP model to synthesize new images by interpolating the latent feature space (**Figure 1**).

The encoder in the generator module extracted features from given connectivity matrices and mapped them into the latent space, and the decoder module synthesized the matrices from the latent variables. We employed convolutional neural networks for brain networks (BrainNetCNN) proposed by Kawahara et al. [21], which consisted of four convolutional layers and two fully connected layers for the encoder. The BrainNetCNN had three convolutional layers specialized for extracting topological features of connectivity matrices: edge-to-edge (E2E), edge-to-node (E2N), and node-to-graph (N2G) convolutions (**Figure 2**). We adopted a simple feed-forward neural network with five fully connected layers for the decoder. Besides, we used the softplus function as an activation function in the final layer because the elemental value of structural connectivity took zero or more. The decoder outputted a vector of the lower triangle part in the connectivity matrix, which was finally matricized. We also utilized the BrainNetCNN architecture with four convolutional layers for the critics.

The mathematical description of the loss function for the generator and critics in our WGAN-GP is presented in Equation 1. The critics aimed to maximize this equation to accurately estimate the Wasserstein distance between the synthesized and original matrices. Conversely, the generator was trained to minimize the same equation, bringing the distributions of the synthesized and original matrices closer together.
where *P*_{r} is the distribution of given connectivity matrices, and *P*_{g} is the distribution of synthesized matrices. *C*(*x*) denotes the output of the critics for a given input *x*, and represents the output of the decoder *G*(*z*) when given a latent variable *z∼p*(*z*) generated in the encoder. The third term in the Equation 1 introduces a gradient penalty for encouraging *C*(*x*) to adhere to the 1-Lipschitz continuity, with *λ* being the coefficient of this penalty.

### 2.2 Task-guided GAN II

In this study, we have built upon the baseline WGAN-GP model to develop a novel GAN-based connectivity synthesis model named TG GAN II. Our proposed model extends the WGAN-GP by incorporating a task-guided branch specifically designed for prediction tasks. Its objective is to augment the dataset with additional connectivity matrices and their corresponding objective variables, thereby facilitating the construction of a more accurate cognitive trait prediction model based on brain connectivity.

**Figure 3** illustrates the architecture of TG GAN II. Within this framework, a regressor model for cognitive trait prediction was integrated into the WGAN-GP structure as the task-guided branch to enhance the performance of the prediction task. This branch was designed to generate a latent space that captured the variations in both explanatory and objective variables. It consisted of four convolutional layers and four fully connected layers, culminating in the output of the predicted objective variable from the synthesized connectivity matrix. The loss function for the task-guided regressor of TG GAN II, outlined in Equation 2, was calculated by the root mean squared error (RMSE) between the observed and predicted objective variables, with both the generator and critics optimized to minimize this function:
where *y*_{i} denotes the observed objective variable and *ŷ*_{i} is the predicted one. In the training of TG GAN II, both Equations 1 and 2 were optimized simultaneously, resulting in the comprehensive loss function for TG GAN II being represented in Equation 3:
where the coefficient *α* served as the weight against *L*_{R}, the regression loss, balancing the two terms of the loss function.

### 2.3 Latent space interpolation for connectivity matrices and objective variable synthesis

Using the trained TG GAN II model, we aimed to synthesize new connectivity matrices and their corresponding objective variables by performing the interpolation in the latent space. A single pair of the connectivity matrix and its objective variable was synthesized using two pairs of them (*X*_{i}, *y*_{i}) and (*X*_{j}, *y*_{j}) following Equations 4 and 5 referring to the method provided by Li et al. [10]:
where *ε* is the value between 0 and 1. This procedure is illustrated in **Figure 4**.

### 2.4 Model training of generative models for data augmentation

#### 2.4.1 Dataset

In this study, we used MRI images and cognitive scores in the NIMH Healthy Research Volunteer Dataset (https://openneuro.org/datasets/ds004215/versions/1.0.1) [19]. We used T1-weighted images, diffusion-weighted images, and the score of NIH Toolbox Cognition Battery [22] (age-adjusted score) as cognitive traits from 108 samples in which T1-weighted image, DWI, and resting fMRI had been obtained without deficiencies. We showed the average and standard variation of these scores in Table. 1. We summed up four task scores in each participant and used it as a fluid intelligence score. This composed score shows a higher signal-noise ratio than each task score and is less affected by the variability of each task score [23].

#### 2.4.2 MRI data preprocessing

We preprocessed DWI and T1 images using a pre-configured DWI process pipeline, QSIPrep (0.15.4) [24]. The preprocessing process consists of the following six steps: (1) T1-weighted image preprocessing, (2) Denoising DWI using Marchenko-Pastur Principal Component Analysis (MP-PCA) [25], (3) B1 field inhomogeneity correction, (4) Susceptibility distortion correction (TOPUP) [26], (5) Eddy current and head motion correction (EDDY) [27], (6) Registration of DWI into T1-weighted image.

#### 2.4.3 Structural connectivity mapping

We also reconstructed whole brain tractograms and structural connectivity using QSIPrep (0.15.4) [24]. The reconstruction of structural connectivity maps was performed in the following steps. First, the fiber orientation distribution function (fODF) was estimated using the Single-Shell 3-Tissue constrained spherical deconvolution (SS3T CSD) model [28] (the response function was estimated following Dhollander et al. [29]). Second, the whole brain tractogram was estimated to be 10 million streamlines using probabilistic (iFOD2 [30]) and anatomically-constrained tractography (ACT) [31]. Third, we computed streamline weights based on the SIFT2 algorithm that reduced the biases in probabilistic tractography [32]. Finally, the whole brain was parcellated into 116 regions based on automated anatomical labeling (AAL), and the structural connectivity was defined as the sum of SIFT2-weighted streamlines connecting two arbitrary regions divided by the sum of the volumes of those regions.

#### 2.4.4 Training setup for generative models

The entire dataset was divided into the discovery dataset (70%; *N* = 75) and the test dataset (30%; *N* = 33). In this division, we sorted the samples by their objective variables and then assigned the samples with a rank of (2nd, 3rd, 5th, 6th, …, 103rd, 105th, 106th, 108th) to the discovery dataset and (1st, 4th, 7th, …, 101st, 104th, 107th) to the test dataset. This method, proposed by Cui et al. [33], was employed to ensure a similar distribution of behavioral scores across datasets and minimize the random bias resulting from the division. The discovery dataset was used to train TG GAN II and WGAN-GP, select the data augmentation model, and construct the prediction model. Conversely, the test dataset validated the prediction model’s accuracy. We adopted a hold-out validation scheme for the training and model selection processes in TG GAN II and WGAN GP. Similarly to the discovery-test splitting, we sorted the discovery samples according to their objective variables. Then we assigned samples with a rank of (2nd, 3rd, 5th, …, 72nd, 74th, 75th) to the training dataset and (1st, 4th, …, 73rd) to the validation dataset.

The TG GAN II and WGAN-GP models were trained with Adam optimizer with momentum parameters *β*_{1} = 0.9, *β*_{1} = 0.999 (learning rate: 0.0001, batch size: 2), and were run for 2000 epochs. We implemented an early stopping strategy to prevent overfitting on the training dataset. The training was stopped if the regressor loss on the validation dataset did not improve for 50 consecutive epochs after reaching the 500th epoch.

The dropout rate *p* of the encoder, critics, and regressor was selected from a set of values: 0.1, 0.2, 0.3, 0.4, and 0.5. Similarly, the coefficient *α* was selected from a range between 0.1 and 1.0 in increments of 0.1. For all trained TG GAN II models, we selected the model that minimized the sum of the regressor loss on the training and validation datasets for data augmentation. In the case of WGAN-GP models, which do not include a regressor, we selected models based on having the same dropout rate as the chosen TG GAN II model.

#### 2.4.5 Evaluation of synthesized connectivity matrices based on graph theory metrics

We evaluated the similarity between the structural connectivity matrices synthesized by TG GAN II and WGAN-GP and the original (acquired) matrices within the discovery dataset. First, we synthesized a connectivity matrix for each sample using these models. To quantify the differences between the acquired and synthesized matrices, we subtracted the synthesized matrices from their corresponding acquired matrices for each generative model. Then, we calculated the average difference matrix for each model. These averaged difference matrices were visualized and compared.

In the subsequent analysis, we quantitatively assessed the similarity between the acquired and synthesized matrices using graph theory metrics: connectivity strength, betweenness centrality, and clustering coefficient. For this purpose, we computed the average matrices for both acquired and synthesized connectivity matrices and binarized them according to edge density thresholds, ranging from 5% to 25% in increments of 5%. We then estimated the distribution of these three graph theory metrics for the acquired and synthesized averaged matrices. The similarity between these distributions was quantified using the Kullback-Leibler (KL) divergence.

#### 2.4.6 Latent space evaluation

We assessed the relevance between the fluid intelligence scores and the latent space generated by TG GAN II and WGAN-GP encoders. Latent variables *z* were sampled from the discovery dataset. We projected the acquired latent variables onto a two-dimensional space using principal component analysis (PCA) to visualize the latent space. Subsequently, we calculated the Pearson correlation between the first principal component (PC1) scores and the corresponding fluid intelligence scores.

### 2.5 Performance evaluation on data augmentation for connectome-based prediction

To evaluate the effectiveness of TG GAN II, we built the regression model to predict the fluid intelligence scores from the structural connectivity matrices augmented by two generative models. We used the Ridge regression algorithm [34], which is widely used in the neuroscience field. The prediction performance of the fluid intelligence was then compared between models augmented by TG GAN II and WGAN-GP.

#### 2.5.1 Training setup for a prediction model

To assess the prediction model’s efficacy within the discovery dataset, we utilized repeated 5-fold cross-validation (5F CV). Additionally, the L2 regularization parameter in Ridge regression was fine-tuned using 5-fold cross-validation within each fold of the outer 5F CV. The methodology for this repeated nested CV process is outlined in subsequent sections.

##### Repeated outer 5F CV

In the outer 5F CV phase, the discovery dataset was randomly divided into five subsets. Four subsets were merged and used as the training dataset, while the remaining subset served as the validation dataset. We constructed the prediction model using the training dataset and the parameters determined during the inner 5F CV phase. This model was then validated against the validation dataset. This cycle of training and validation was repeated until each subset had been used as the validation dataset once. The prediction accuracy was calculated as the Pearson correlation between the observed and predicted scores. The outer 5F CV was repeated 20 times to avoid the bias resulting from random splitting.

##### Inner 5F CV and parameter tuning

Within each iteration of the outer 5F CV, the L2 regularization parameter for Ridge regression was optimized through an inner 5F CV process. We selected this parameter from a set of 16 possible values: *α ∈* {*x* | *x* = 2^{n}, *n ∈ 𝕫, n ∈* [−10, 5]}.. For the inner 5F CV, the training dataset from the outer loop was randomly split into five subsets. Four subsets were used in each iteration to train the model under each parameter setting, with the fifth subset reserved for validation. This procedure was executed five times, ensuring each subset was utilized as the validation dataset. For each parameter across the inner 5F CV loops, we calculated a Pearson correlation and a mean absolute error (MAE) between the observed and predicted scores, averaging these metrics across all loops. The sum of the mean correlation and the inverse of MAE, standardized for differing scales, served as the measure of prediction accuracy in inner validation [6], [33]. The parameter that yielded the highest accuracy model was selected in the subsequent outer 5F CV.

##### Evaluating Test-Retest Reliability

To evaluate the test-retest reliability of the prediction model built through repeated nested CV, 100 prediction models from the outer 5F CV were validated using the test dataset. The prediction accuracy was defined by Pearson’s correlation and RMSE between observed and predicted scores.

#### 2.5.2 Evaluation of prediction performance

Finally, we evaluated the accuracy of the fluid intelligence score prediction model augmented by TG GAN II and WGAN-GP. Data augmentation was applied to the discovery dataset, increasing the structural connectivity matrices and fluid intelligence scores to double (original data + 100%), triple (+ 200%), quadruple (+ 300%), and quintuple (+ 400%) through the latent space interpolation, as detailed in section 2.4.6. Utilizing the original and augmented datasets, we constructed fluid intelligence prediction models using the repeated nested CV method outlined in section 2.5.1. These models were then validated against the test dataset. To compare the prediction accuracy between the data-augmented models and the baseline (non-augmented: original sample size) model, we conducted Welch’s *t*-test. For this analysis, outlier detection was performed on the accuracy of 100 prediction models, identifying any values more than three standard deviations from the mean as outliers. Additionally, we applied the Bonferroni correction for multiple comparisons.

## 3. Results

### 3.1 Analysis of synthesized connectivity matrices

**Figure 5** presents the average connectivity matrices for both the acquired and the synthesized structural matrices using TG GAN II and WGAN-GP, with brain regions aligned according to Yeo’s 7 Network definition. In **Figure 5A**, the weighted matrices are displayed, whereas **Figure 5B** showcases those binarized at a 25% edge density threshold. The synthesized matrices from WGAN-GP exhibit connections that do not exist in the acquired matrices, not observed with TG GAN II. This is evident in both the weighted and binarized connectivity matrices.

**Figure 6** displays the average difference in connectivity matrices between the acquired and those synthesized by TG GAN II and WGAN-GP. Each part of the connectivity matrix differentiated the connectomes: intra-hemispherical connections (left hemisphere: A, right hemisphere: C), inter-hemispherical connections (B), cerebrum-cerebellum connections (D), and intra-cerebellar connections (E). The differences were notable in the intra-hemispherical and intra-cerebellar connections. TG GAN II and WGAN-GP tended to estimate weaker connections than those observed in the actual data. Additionally, WGAN-GP was shown to generate more inter-hemispherical and cerebrum-cerebellum connections compared to TG GAN II, indicating that TG GAN II synthesized connectivity matrices similar to the acquired data in these regions than WGAN-GP.

**Figure 7** indicates a quantitative analysis of the topological features of both the acquired and synthesized connectivity matrices, utilizing graph theory metrics: connectivity strength, betweenness centrality, and clustering coefficient. The distribution of each graph metric is displayed as a density plot, along with the similarity between the distributions of the acquired and synthesized matrices estimated using KL-divergence. Notably, the KL-divergence values and density plots indicate no difference in the distribution of connectivity strength between matrices synthesized by TG GAN II and those by WGAN-GP. Conversely, the distributions of betweenness centrality and clustering coefficient suggested that TG GAN II produced matrices with topological characteristics more closely aligned to those of the acquired matrices than WGAN-GP. This is evidenced by smaller KL-divergence values and the similarity of density plots across all edge density thresholds.

### 3.2 Latent space analysis

**Figure 8** illustrates the latent spaces of each generative model, projected into two-dimensional space using PCA, with each sample annotated by its corresponding fluid intelligence score. The left side of **Figure 8** displays the PCA-projected latent spaces, while the right side shows the correlation between the first principal component (PC1) and the fluid intelligence scores. The correlation is higher in TG GAN II (*r* = 0.68, *p* < .05) than WGAN-GP (*r* = 0.18, *p* > .05), indicating that the latent space in TG GAN II is more closely associated with fluid intelligence scores. These results demonstrated that TG GAN II effectively captured the variance in fluid intelligence scores within its latent space.

### 3.3 Prediction performance

**Figure 9** and **Figure** 10 illustrate the accuracy of the prediction models using Pearson’s correlation between observed and predicted fluid intelligence scores, as well as RMSE, for data augmented by TG GAN II and WGAN-GP, respectively. The term ‘baseline’ in each figure refers to models developed solely with acquired samples without data augmentation. Each data point represents a model constructed through the repeated nested CV method. The results in **Figure 9** revealed that data augmentation with TG GAN II significantly enhanced the correlation between observed and predicted scores beyond the baseline (*p* < .01, Bonferroni-corrected), whereas augmentation with WGAN-GP did not improve. Regarding RMSE, augmentation with neither TG GAN II nor WGAN-GP led to improvements. Also, data augmentation with WGAN-GP resulted in worse RMSE than the baseline, whereas the RMSE for models augmented with TG GAN II remained consistent with the baseline (**Figure 10**). These results were consistent also in the summary of the accuracy of data-augmented and baseline prediction models indicated in **Table 2**. Notably, increasing the number of synthesized samples did not lead to further improvements in prediction accuracy, even with TG GAN II.

## 4. Discussion

### 4.1 The relationship between graph structure and prediction accuracy

In this study, we introduced TG GAN II, a novel method for augmenting brain connectivity data, to enhance the accuracy of predictive models for human cognitive traits and behavior. The analysis of the fluid intelligence prediction model, constructed using the NIMH Healthy Research Volunteer Dataset, showed that TG GAN II could synthesize structural connectivity matrices with features more closely aligned with the actual data than its baseline model, WGAN-GP. Furthermore, data augmentation with TG GAN II resulted in improved prediction accuracy.

Upon examining the average synthesized and acquired matrices, it was observed that WGAN-GP tended to overgenerate inter-hemispherical connections compared to TG GAN II, which inaccurately reflected the fewer connections present in the actual data. Zimmermann et al. have suggested that intra-hemispherical structural connectivity is crucial for capturing the variance in cognitive traits, including fluid intelligence [35]—a notion supported by such connections in the acquired data. Unlike WGAN-GP, TG GAN II effectively minimized inter-hemispherical connections while enhancing intra-hemispherical connections, thereby contributing to the enhanced predictive accuracy of the model.

Furthermore, a graph theoretical analysis of both acquired and synthesized connectivity matrices showed that TG GAN II could produce structural connectivity matrices with topological features more closely matching those of the actual data than WGAN-GP. Garai et al. highlighted that network patterns or topological features in structural connectivity possess predictive power for human cognitive traits [36]. While Litwińczuk observed no consistent advantage in utilizing graph theory measures over connectivity values for explaining and predicting cognitive functions in healthy and typical domains, they noted instances where nodal graph theory metrics of the structural network outperformed raw connectivity models in predictive ability [37]. These prior studies support that the enhanced prediction accuracy observed with matrices synthesized by TG GAN II can be attributed to their graph theoretical features, which are more akin to those of the acquired matrices.

Consequently, our results suggest that the task-guided branch, implemented as the regression model in TG GAN II, could enhance the synthesis of relevant connectomic features for fluid intelligence prediction while suppressing ineffective features.

### 4.2 Latent space structure

The previous studies, the foundational work on Task-guided GANs [13] and BrainNetGAN [12], have demonstrated the efficacy of task-guided branches in enhancing task performance— specifically, improving age prediction accuracy from T1-weighted images and classification accuracy for Alzheimer’s disease, respectively.

In our study, by visualizing the embedded latent space using PCA, we have shown that the task-guided branch not only improved the task performance but also formed a latent space where latent variables were significantly correlated with fluid intelligence scores (**Figure 8**). This visualization in two-dimensional space provides a novel insight absent in prior research, highlighting that our approach enables a deeper understanding of how input variables are integrated with outcome variables within the latent space of generative models. This finding emphasizes the unique contribution of our method to the field, demonstrating that the task-guided branch is effective in forming a latent space that bridges input variables and outcome variables.

### 4.3 Limitations and future work

The demonstrated efficacy of our TG GAN II in connectome-based prediction tasks suggested future directions for further investigations despite some limitations that need discussion.

While our evaluation focused on synthesizing structural connectomes, the versatile framework of TG GAN II is not confined to this application alone; it holds the potential for enhancing functional connectome-based predictions through data augmentation. In future work, we will explore whether TG GAN II’s augmentation capabilities can indeed improve the accuracy of predictions based on functional connectomes.

Moreover, the initial application of our method was conducted on a relatively small dataset, a common scenario in studies where data augmentation aims to tackle the challenges posed by limited sample sizes. Consequently, the scalability and performance of TG GAN II in contexts involving larger datasets remain to be fully explored. Future studies are encouraged to extend the application of our method to more extensive datasets.

## 7. Supporting Information

The source code for the methods used in this paper will be made available at https://github.com/MIS-Lab-Doshisha/tg-gan2.

## 5. Acknowledgements

None to declare.