Task-guided Generative Adversarial Networks for Synthesizing and Augmenting Structural Connectivity Matrices for Connectivity-Based Prediction

Recent machine learning techniques have improved the modeling of complex dependencies between brain connectivity and cognitive/behavioral traits, facilitating connectome-based predictions. However, they typically require large datasets. While large open datasets like the Human Connectome Project have offered significant benefits to connectomics research, collecting such large data remains a challenge due to the financial cost and time. To address this issue, we propose Task-guided GAN II, a novel data augmentation method leveraging generative adversarial networks (GANs) to enhance the sample size from limited datasets for connectome-based prediction tasks. Distinguishing from previous approaches, our method incorporates a task-guided branch within the conventional Wasserstein GAN framework, specifically designed to synthesize structural connectivity matrices. It aims to effectively augment data and improve the prediction accuracy of human cognitive traits by capturing more task-directed features within the data. We evaluated the effectiveness of data augmentation using Task-guided GAN II in predicting fluid intelligence utilizing the NIMH Health Research Volunteer Dataset. Our results demonstrate that data augmentation with Task-guided GAN II not only improves prediction accuracy but also ensures that its latent space effectively captures correlations between structural connectivity and cognitive outcomes. Our method would be beneficial in leveraging small datasets for human connectomics research.


Introduction
Connectomics, which represents neural circuits as networks and analyzes their topology through mathematical approaches, has emerged as an essential methodology in cognitive and computational neuroscience [1] [2].It facilitates a deeper understanding of the complex relationships among brain networks, cognition, behavior, and individual differences in human cognitive traits and behaviors.Additionally, recent advances in machine learning methodologies have employed connectome-based machine learning models using structural and functional connectivity matrices to predict behavior or cognitive traits [3].
Recent studies have revealed that building prediction models requires more than a hundred samples [4], [5] and has also been shown to be improved in accuracy and robustness as the sample size becomes larger, not depending on the choice of regression models [6].Therefore, researchers have often used large-sample open datasets such as the Human Connectome Project (HCP) Dataset (N = 1200) [7], ABCD Study (N = 11878) [8], UK Biobank (N = 40000+) [9].These datasets have contributed significantly to developing analytical methods and understanding human connectomics [10].However, Yeung et al. have reported that most self-recruited samples included around 100 subjects [5].In the exploratory stage of the research, it is not easy for individual researchers to collect large samples over 100 due to the limitations on time and cost.
One of the best solutions to resolve the small sample size issue is to employ a data augmentation that generates a new sample by manipulating the given small dataset.In traditional data augmentation, geometric transformation has often been utilized in image data, such as rotation or flipping to the original data.However, such manipulations are inapplicable to the matrix-shaped data of brain functional and structural connectivity because their shape was constrained by the order of brain regions.In recent years, data augmentation methods using generative models have been proposed [11], [12], [13], [14], [15], [16], [17].In these methods, generative augmentation models approximate the distribution of a given dataset and synthesize new samples with similar characteristics to the original dataset.Specifically, generative adversarial networks (GAN) have been increasingly used to data synthesis in numerous application fields with its strong feature of learning the local and global structure of data [18].In the neuroscience field, Chao Li et al.
proposed the structural connectivity augmentation method named BrainNetGAN to build the model for the classification of Alzheimer's disease [12].The accuracy of the graph neural network-based classification model of Alzheimer's disease was improved using the structural connectivity data synthesized by their BrainNetGAN.Ruizhe Li et al. also suggested a data augmentation method for T1-weighted image named Task-guided GAN (TG GAN) for the brain age prediction model [13].They improved the accuracy of the age prediction [13].In their TG GAN model, the task-guided branch of a regression model for brain age prediction was incorporated into the GAN architecture.This accelerated the data synthesis to be more taskspecific and contributed to improving the prediction task.
There are a lot of studies on data augmentation methods for medical images, but a few studies on brain connectome data augmentation methods for the connectome-based prediction task.The BrainNetGAN, as mentioned above, was developed for the classification task and trained using specific class labels to synthesize conditioned data.On the other hand, the TG GAN was designed with a 3D image input structure specifically for the T1-weighted image synthesis.Neither model is suitable for synthesizing connectivity matrices aimed at prediction tasks.Given the rapidly increasing interest in connectome-based prediction studies, there is an urgent need to develop a data augmentation model tailored to these tasks.Here, we propose a novel data augmentation method, Task-guided GAN II (TG GAN II).It was designed for synthesizing the structural connectivity matrices and employs a task-guided branch to predict human cognitive traits from the structural connectivity.We aim to enhance the accuracy of predicting cognitive traits from structural connectivity by augmenting the original dataset with the data synthesized by our TG GAN II.To assess the effectiveness of our method, we utilize the NIMH Health Research Volunteer Dataset [19] to build the model predicting fluid intelligence based on structural connectivity.The TG GAN II employs the Wasserstein GAN (WGAN) without task-guided branches as its baseline model.It is evaluated against this baseline in the following three perspectives: 1) the similarity of graph features between synthesized and original data, 2) the improvement in prediction performance, and 3) the influence of the added synthetic data volume on prediction performance enhancement.We hope that these investigations will make a substantial contribution toward addressing the sample size challenges in connectome-based prediction tasks.

Baseline model: Wasserstein GAN with gradient penalty (WGAN-GP)
We employed a Wasserstein GAN with gradient penalty (WGAN-GP) [20] as the baseline model for our proposed TG GAN II.The WGAN-GP incorporated the Wasserstein distance measure into its objective function to improve the stability of model training by suppressing the mode collapse.
It consisted of the generator that synthesized new structural connectivity matrices, and the critics assessed the synthesized images by estimating the Wasserstein distance between synthesized and given ones.We used an autoencoder-based generator in our WGAN-GP model to synthesize new images by interpolating the latent feature space (Figure 1).The encoder in the generator module extracted features from given connectivity matrices and mapped them into the latent space, and the decoder module synthesized the matrices from the latent variables.We employed convolutional neural networks for brain networks (BrainNetCNN) proposed by Kawahara et al. [21], which consisted of four convolutional layers and two fully connected layers for the encoder.The BrainNetCNN had three convolutional layers specialized for extracting topological features of connectivity matrices: edge-to-edge (E2E), edge-to-node (E2N), and node-to-graph (N2G) convolutions (Figure 2).We adopted a simple feed-forward neural network with five fully connected layers for the decoder.Besides, we used the softplus function as an activation function in the final layer because the elemental value of structural connectivity took zero or more.The decoder outputted a vector of the lower triangle part in the connectivity matrix, which was finally matricized.We also utilized the BrainNetCNN architecture with four convolutional layers for the critics.The mathematical description of the loss function for the generator and critics in our WGAN-GP is presented in Equation 1.The critics aimed to maximize this equation to accurately estimate the Wasserstein distance between the synthesized and original matrices.Conversely, the generator was trained to minimize the same equation, bringing the distributions of the synthesized and original matrices closer together.
where  , is the distribution of given connectivity matrices, and  -is the distribution of synthesized matrices.() denotes the output of the critics for a given input  , and  4 represents the output of the decoder () when given a latent variable ~() generated in the encoder.The third term in the Equation 1 introduces a gradient penalty for encouraging () to adhere to the 1-Lipschitz continuity, with  being the coefficient of this penalty.

Task-guided GAN II
In this study, we have built upon the baseline WGAN-GP model to develop a novel GAN-based connectivity synthesis model named TG GAN II.Our proposed model extends the WGAN-GP by incorporating a task-guided branch specifically designed for prediction tasks.Its objective is to augment the dataset with additional connectivity matrices and their corresponding objective where  / denotes the observed objective variable and  8 / is the predicted one.In the training of TG GAN II, both Equations 1 and 2 were optimized simultaneously, resulting in the comprehensive loss function for TG GAN II being represented in Equation 3: where the coefficient  served as the weight against  ., the regression loss, balancing the two terms of the loss function.

Latent space interpolation for connectivity matrices and objective variable synthesis
Using the trained TG GAN II model, we aimed to synthesize new connectivity matrices and their corresponding objective variables by performing the interpolation in the latent space.A single pair of the connectivity matrix and its objective variable was synthesized using two pairs of them ( / ,  / ) and I 3 ,  3 J following Equations 4 and 5 referring to the method provided by Li et al. [10]: where  is the value between 0 and 1.This procedure is illustrated in Figure 4.

Dataset
In this study, we used MRI images and cognitive scores in the NIMH Healthy Research Volunteer Dataset (https://openneuro.org/datasets/ds004215/versions/1.0.1) [19].We used T1-weighted images, diffusion-weighted images, and the score of NIH Toolbox Cognition Battery [22] (ageadjusted score) as cognitive traits from 108 samples in which T1-weighted image, DWI, and resting fMRI had been obtained without deficiencies.We showed the average and standard variation of these scores in Table .1.We summed up four task scores in each participant and used

STEP 1 Latent vectors generation STEP 2 Latent vector interpolation STEP 3 Matrix synthesis
it as a fluid intelligence score.This composed score shows a higher signal-noise ratio than each task score and is less affected by the variability of each task score [23].

Structural connectivity mapping
We also reconstructed whole brain tractograms and structural connectivity using QSIPrep (0.15.4) [24].The reconstruction of structural connectivity maps was performed in the following steps.
First, the fiber orientation distribution function (fODF) was estimated using the Single-Shell 3-Tissue constrained spherical deconvolution (SS3T CSD) model [28] (the response function was estimated following Dhollander et al. [29]).Second, the whole brain tractogram was estimated to be 10 million streamlines using probabilistic (iFOD2 [30]) and anatomically-constrained tractography (ACT) [31].Third, we computed streamline weights based on the SIFT2 algorithm that reduced the biases in probabilistic tractography [32].Finally, the whole brain was parcellated into 116 regions based on automated anatomical labeling (AAL), and the structural connectivity was defined as the sum of SIFT2-weighted streamlines connecting two arbitrary regions divided by the sum of the volumes of those regions.

Training setup for generative models
The entire dataset was divided into the discovery dataset (70%;  = 75) and the test dataset (30%;  = 33).In this division, we sorted the samples by their objective variables and then assigned the samples with a rank of (2nd, 3rd, 5th, 6th, ..., 103rd, 105th, 106th, 108th) to the discovery dataset and (1st, 4th, 7th, ..., 101st, 104th, 107th) to the test dataset.This method, proposed by Cui et al. [33], was employed to ensure a similar distribution of behavioral scores across datasets and minimize the random bias resulting from the division.The discovery dataset was used to train TG GAN II and WGAN-GP, select the data augmentation model, and construct the prediction model.Conversely, the test dataset validated the prediction model's accuracy.We adopted a hold-out validation scheme for the training and model selection processes in TG GAN II and WGAN GP.Similarly to the discovery-test splitting, we sorted the discovery samples according to their objective variables.Then we assigned samples with a rank of (2nd, 3rd, 5th, ..., 72nd, 74th, 75th) to the training dataset and (1st, 4th, ..., 73rd) to the validation dataset.
The TG GAN II and WGAN-GP models were trained with Adam optimizer with momentum parameters  2 = 0.9,  2 = 0.999 (learning rate: 0.0001, batch size: 2), and were run for 2000 epochs.We implemented an early stopping strategy to prevent overfitting on the training dataset.
The training was stopped if the regressor loss on the validation dataset did not improve for 50 consecutive epochs after reaching the 500th epoch.
The dropout rate p of the encoder, critics, and regressor was selected from a set of values: 0.1, 0.2, 0.3, 0.4, and 0.5.Similarly, the coefficient  was selected from a range between 0.1 and 1.0 in increments of 0.1.For all trained TG GAN II models, we selected the model that minimized the sum of the regressor loss on the training and validation datasets for data augmentation.In the case of WGAN-GP models, which do not include a regressor, we selected models based on having the same dropout rate as the chosen TG GAN II model.

Evaluation of synthesized connectivity matrices based on graph theory metrics
We evaluated the similarity between the structural connectivity matrices synthesized by TG GAN II and WGAN-GP and the original (acquired) matrices within the discovery dataset.First, we synthesized a connectivity matrix for each sample using these models.To quantify the differences between the acquired and synthesized matrices, we subtracted the synthesized matrices from their corresponding acquired matrices for each generative model.Then, we calculated the average difference matrix for each model.These averaged difference matrices were visualized and compared.
In the subsequent analysis, we quantitatively assessed the similarity between the acquired and synthesized matrices using graph theory metrics: connectivity strength, betweenness centrality, and clustering coefficient.For this purpose, we computed the average matrices for both acquired and synthesized connectivity matrices and binarized them according to edge density thresholds, ranging from 5% to 25% in increments of 5%.We then estimated the distribution of these three graph theory metrics for the acquired and synthesized averaged matrices.The similarity between these distributions was quantified using the Kullback-Leibler (KL) divergence.

Latent space evaluation
We assessed the relevance between the fluid intelligence scores and the latent space generated by TG GAN II and WGAN-GP encoders.Latent variables  were sampled from the discovery dataset.We projected the acquired latent variables onto a two-dimensional space using principal component analysis (PCA) to visualize the latent space.Subsequently, we calculated the Pearson correlation between the first principal component (PC1) scores and the corresponding fluid intelligence scores.

Performance evaluation on data augmentation for connectome-based prediction
To evaluate the effectiveness of TG GAN II, we built the regression model to predict the fluid intelligence scores from the structural connectivity matrices augmented by two generative models.
We used the Ridge regression algorithm [34], which is widely used in the neuroscience field.The prediction performance of the fluid intelligence was then compared between models augmented by TG GAN II and WGAN-GP.

Training setup for a prediction model
To assess the prediction model's efficacy within the discovery dataset, we utilized repeated 5-fold cross-validation (5F CV).Additionally, the L2 regularization parameter in Ridge regression was fine-tuned using 5-fold cross-validation within each fold of the outer 5F CV.The methodology for this repeated nested CV process is outlined in subsequent sections.

Repeated outer 5F CV
In the outer 5F CV phase, the discovery dataset was randomly divided into five subsets.Four subsets were merged and used as the training dataset, while the remaining subset served as the validation dataset.We constructed the prediction model using the training dataset and the parameters determined during the inner 5F CV phase.This model was then validated against the validation dataset.This cycle of training and validation was repeated until each subset had been used as the validation dataset once.The prediction accuracy was calculated as the Pearson correlation between the observed and predicted scores.The outer 5F CV was repeated 20 times to avoid the bias resulting from random splitting.
Inner 5F CV and parameter tuning Within each iteration of the outer 5F CV, the L2 regularization parameter for Ridge regression was optimized through an inner 5F CV process.We selected this parameter from a set of 16 possible values:  ∈ { |  = 2 0 ,  ∈ ℤ,  ∈ [−10, 5]}..
For the inner 5F CV, the training dataset from the outer loop was randomly split into five subsets.
Four subsets were used in each iteration to train the model under each parameter setting, with the fifth subset reserved for validation.This procedure was executed five times, ensuring each subset was utilized as the validation dataset.For each parameter across the inner 5F CV loops, we calculated a Pearson correlation and a mean absolute error (MAE) between the observed and predicted scores, averaging these metrics across all loops.The sum of the mean correlation and the inverse of MAE, standardized for differing scales, served as the measure of prediction accuracy in inner validation [6], [33].The parameter that yielded the highest accuracy model was selected in the subsequent outer 5F CV.

Evaluating Test-Retest Reliability
To evaluate the test-retest reliability of the prediction model built through repeated nested CV, 100 prediction models from the outer 5F CV were validated using the test dataset.The prediction accuracy was defined by Pearson's correlation and RMSE between observed and predicted scores.

Evaluation of prediction performance
Finally, we evaluated the accuracy of the fluid intelligence score prediction model augmented by TG GAN II and WGAN-GP.Data augmentation was applied to the discovery dataset, increasing the structural connectivity matrices and fluid intelligence scores to double (original data + 100%), triple (+ 200%), quadruple (+ 300%), and quintuple (+ 400%) through the latent space interpolation, as detailed in section 2.4.6.Utilizing the original and augmented datasets, we constructed fluid intelligence prediction models using the repeated nested CV method outlined in section 2.5.1.These models were then validated against the test dataset.To compare the prediction accuracy between the data-augmented models and the baseline (non-augmented: original sample size) model, we conducted Welch's t-test.For this analysis, outlier detection was performed on the accuracy of 100 prediction models, identifying any values more than three standard deviations from the mean as outliers.Additionally, we applied the Bonferroni correction for multiple comparisons.

Analysis of synthesized connectivity matrices
Figure 5 presents the average connectivity matrices for both the acquired and the synthesized structural matrices using TG GAN II and WGAN-GP, with brain regions aligned according to Yeo's 7 Network definition.In Figure 5A, the weighted matrices are displayed, whereas Figure 5B showcases those binarized at a 25% edge density threshold.The synthesized matrices from WGAN-GP exhibit connections that do not exist in the acquired matrices, not observed with TG GAN II.This is evident in both the weighted and binarized connectivity matrices.Clustering coefficient  revealed that data augmentation with TG GAN II significantly enhanced the correlation between observed and predicted scores beyond the baseline (p < .01,Bonferroni-corrected), whereas augmentation with WGAN-GP did not improve.Regarding RMSE, augmentation with neither TG GAN II nor WGAN-GP led to improvements.Also, data augmentation with WGAN-GP resulted in worse RMSE than the baseline, whereas the RMSE for models augmented with TG GAN II remained consistent with the baseline (Figure 10).These results were consistent also in the summary of the accuracy of data-augmented and baseline prediction models indicated in Table 2. Notably, increasing the number of synthesized samples did not lead to further improvements in prediction accuracy, even with TG GAN II.

The relationship between graph structure and prediction accuracy
In this study, we introduced TG GAN II, a novel method for augmenting brain connectivity data, to enhance the accuracy of predictive models for human cognitive traits and behavior.The Furthermore, a graph theoretical analysis of both acquired and synthesized connectivity matrices showed that TG GAN II could produce structural connectivity matrices with topological features more closely matching those of the actual data than WGAN-GP.Garai et al. highlighted that network patterns or topological features in structural connectivity possess predictive power for human cognitive traits [36].While Litwińczuk observed no consistent advantage in utilizing graph theory measures over connectivity values for explaining and predicting cognitive functions in healthy and typical domains, they noted instances where nodal graph theory metrics of the structural network outperformed raw connectivity models in predictive ability [37].These prior studies support that the enhanced prediction accuracy observed with matrices synthesized by TG GAN II can be attributed to their graph theoretical features, which are more akin to those of the acquired matrices.
Consequently, our results suggest that the task-guided branch, implemented as the regression model in TG GAN II, could enhance the synthesis of relevant connectomic features for fluid intelligence prediction while suppressing ineffective features.

Latent space structure
The previous studies, the foundational work on Task-guided GANs [13] and BrainNetGAN [12], have demonstrated the efficacy of task-guided branches in enhancing task performancespecifically, improving age prediction accuracy from T1-weighted images and classification accuracy for Alzheimer's disease, respectively.
In our study, by visualizing the embedded latent space using PCA, we have shown that the taskguided branch not only improved the task performance but also formed a latent space where latent variables were significantly correlated with fluid intelligence scores (Figure 8).This visualization in two-dimensional space provides a novel insight absent in prior research, highlighting that our approach enables a deeper understanding of how input variables are integrated with outcome variables within the latent space of generative models.This finding emphasizes the unique contribution of our method to the field, demonstrating that the task-guided branch is effective in forming a latent space that bridges input variables and outcome variables.

Limitations and future work
The demonstrated efficacy of our TG GAN II in connectome-based prediction tasks suggested future directions for further investigations despite some limitations that need discussion.
While our evaluation focused on synthesizing structural connectomes, the versatile framework of TG GAN II is not confined to this application alone; it holds the potential for enhancing functional connectome-based predictions through data augmentation.In future work, we will explore whether TG GAN II's augmentation capabilities can indeed improve the accuracy of predictions based on functional connectomes.
Moreover, the initial application of our method was conducted on a relatively small dataset, a common scenario in studies where data augmentation aims to tackle the challenges posed by limited sample sizes.Consequently, the scalability and performance of TG GAN II in contexts involving larger datasets remain to be fully explored.Future studies are encouraged to extend the application of our method to more extensive datasets.

Acknowledgements
None to declare.

Figure 1 .
Figure 1.The architecture of the WGAN-GP used for synthesizing connectivity matrices, serving as the baseline model, where   and   denote the distributions of given and synthesized connectivity matrices, respectively, and  and  # are the given input and synthesized matrices, respectively.The encoder transforms the input matrix into latent variables  , while the decoder reconstructs the matrix  # from  .The critics aim to estimate the Wasserstein distance between the given and synthesized matrices, while the generator seeks to minimize this distance.

Figure 2 .Figure 2 .
Figure 2. Figure 2. Architecture of BrainNetCNN.This architecture introduces convolution kernels designed for the adjacency matrix of brain networks, a development unique to BrainNetCNN.These kernels are applied to an element's entire row and column, enabling edge-to-edge (E2E) and edge-to-node (E2N) convolution operations.Initially, the connectivity matrix undergoes convolution with one or more E2E kernels, emphasizing the connections between adjacent brain regions.Subsequently, the output from the E2E convolution is processed through an E2N filter, which calculates a weighted sum of edges for each brain region.A node-to-graph (N2G) kernel integrates these weighted node values to produce a single output value, representing an aggregate measure of brain connectivity.

Figure 3 .Figure 3
Figure 3.The architecture of Task-guided GAN II for synthesizing connectivity matrices.This model introduces modifications over the WGAN-GP, notably incorporating a taskguided regressor branch and specialized loss function calculations.The regressor branch features four convolutional layers and four fully connected layers, which together output the predicted objective variable based on the synthesized connectivity matrix.The loss function of the WGAN-GP is expanded to include the root mean squared error (RMSE) between the observed and predicted objective variables, facilitating more accurate predictions.

Figure 4 .
Figure 4. Synthesis of connectivity matrices and their corresponding objective variables through latent vector interpolation.This process involves using two pairs of given connectivity matrices and objective variables, (  ,   ) and I  ,   J , as inputs.These samples are first mapped to the latent space using the trained generative model.A new set of latent variables is then generated by performing linear interpolation between these mapped points.The interpolated latent variables are fed into the decoder, which reconstructs the connectivity matrices.

Figure 6
Figure 6 displays the average difference in connectivity matrices between the acquired and those synthesized by TG GAN II and WGAN-GP.Each part of the connectivity matrix differentiated the connectomes: intra-hemispherical connections (left hemisphere: A, right hemisphere: C), inter-hemispherical connections (B), cerebrum-cerebellum connections (D), and intra-cerebellar connections (E).The differences were notable in the intra-hemispherical and intra-cerebellar connections.TG GAN II and WGAN-GP tended to estimate weaker connections than those observed in the actual data.Additionally, WGAN-GP was shown to generate more interhemispherical and cerebrum-cerebellum connections compared to TG GAN II, indicating that TG GAN II synthesized connectivity matrices similar to the acquired data in these regions than WGAN-GP.

Figure 5 .Figure 6 .
Figure 5.Comparison between acquired and synthesized structural connectivity matrices using Task-guided GAN II and WGAN-GP.(A) displays weighted connectivity matrices, and (B) shows binarized connectivity matrices, where binarization was applied to retain only the top 25% of the strongest connections.The brain regions in matrices were aligned according to Yeo's 7 Network definition.

Figure 7
Figure 7 indicates a quantitative analysis of the topological features of both the acquired and synthesized connectivity matrices, utilizing graph theory metrics: connectivity strength, betweenness centrality, and clustering coefficient.The distribution of each graph metric is displayed as a density plot, along with the similarity between the distributions of the acquired and synthesized matrices estimated using KL-divergence.Notably, the KL-divergence values and density plots indicate no difference in the distribution of connectivity strength between matrices synthesized by TG GAN II and those by WGAN-GP.Conversely, the distributions of betweenness centrality and clustering coefficient suggested that TG GAN II produced matrices with topological characteristics more closely aligned to those of the acquired matrices than WGAN-GP.This is evidenced by smaller KL-divergence values and the similarity of density plots across all edge density thresholds.

Figure 7 .Figure 8
Figure 7. Distribution of graph theory metrics (connectivity strength, betweenness centrality, clustering coefficient) across acquired and synthesized connectivity matrices via TG GAN II and WGAN-GP under various threshold settings.The upper section displays results from Task-guided GAN II, while the lower section indicates WGAN-GP results.The similarity between the distributions of graph metrics is quantified using the Kullback-Leibler divergence.

Figure 8 . 3 . 3 Figure 9 and
Figure 8. Visualization of the latent space through principal component analysis (PCA) on the left, and the correlation between the first principal component (PC 1) and the fluid intelligence score on the right.

Figure 9 .
Figure 9. Accuracy of the prediction model, evaluated by Pearson's correlation between observed and predicted scores.(A) indicates data augmentation results using Task-guided GAN II, while (B) presents results from employing Wasserstein GAN GP.

Figure 10 .
Figure 10.Accuracy of the prediction model, evaluated by the root mean squared error (RMSE) between observed and predicted scores.(A) indicates data augmentation results using Task-guided GAN II, while (B) presents results from employing Wasserstein GAN GP.
analysis of the fluid intelligence prediction model, constructed using the NIMH Healthy Research Volunteer Dataset, showed that TG GAN II could synthesize structural connectivity matrices with features more closely aligned with the actual data than its baseline model, WGAN-GP.Furthermore, data augmentation with TG GAN II resulted in improved prediction accuracy.Upon examining the average synthesized and acquired matrices, it was observed that WGAN-GP tended to overgenerate inter-hemispherical connections compared to TG GAN II, which inaccurately reflected the fewer connections present in the actual data.Zimmermann et al. have suggested that intra-hemispherical structural connectivity is crucial for capturing the variance in cognitive traits, including fluid intelligence[35]-a notion supported by such connections in the acquired data.Unlike WGAN-GP, TG GAN II effectively minimized inter-hemispherical connections while enhancing intra-hemispherical connections, thereby contributing to the enhanced predictive accuracy of the model.