Abstracts
Identifying individuals with early mild cognitive impairment (EMCI) can be an effective strategy for early diagnosis and delay the progression of Alzheimer’s disease (AD). Many approaches have been devised to discriminate those with EMCI from healthy control (HC) individuals. Selection of the most effective parameters has been one of the challenging aspects of these approaches. In this study we suggest an optimization method based on five evolutionary algorithms that can be used in optimization of neuroimaging data with a large number of parameters. Resting-state functional magnetic resonance imaging (rs-fMRI) measures, which measure functional connectivity, have been shown to be useful in prediction of cognitive decline. Analysis of functional connectivity data using graph measures is a common practice that results in a great number of parameters. Using graph measures we calculated 1155 parameters from the functional connectivity data of HC (n=36) and EMCI (n=34) extracted from the publicly available database of the Alzheimer’s disease neuroimaging initiative database (ADNI). These parameters were fed into the evolutionary algorithms to select a subset of parameters for classification of the data into two categories of EMCI and HC using a two-layer artificial neural network. All algorithms achieved classification accuracy of 94.55%, which is extremely high considering single-modality input and low number of data participants. These results highlight potential application of rs-fMRI and efficiency of such optimization methods in classification of images into HC and EMCI. This is of particular importance considering that MRI images of EMCI individuals cannot be easily identified by experts.
1 INTRODUCTION
Alzheimer’s disease (AD) is the most common type of dementia, with around 50 million patients worldwide 1,2. AD is usually preceded by a period of mild cognitive impairment (MCI) 3,4. Identifying the subjects with MCI could be an effective strategy for early diagnosis and delay the progression of AD towards irreversible brain damage 5–7. While researchers were successful, to some extent, in diagnosis of AD, researchers were significantly less successful in diagnosis of MCI 8–11. In particular, detection of early stages of MCI (EMCI) has been proven to be very challenging 12–14. Therefore, in this study we propose a novel method based on evolutionary algorithms to select a subset of graph features calculated from functional connectivity data to discriminate between healthy participants (HC) and EMCI.
It has been shown that the brain goes through many functionally and physiologically changes prior to any obvious behavioral symptoms in AD 15–17. Therefore, many approaches have been devised based on biomarkers to distinguish between HC, and different stages of MCI, and AD 18–20. For example, segmentation of structural magnetic resonance imaging (MRI) data has been used in many studies as brain structure changes greatly in AD 21–24.
While structural neuroimaging has shown some success in early detection of AD, functional neuroimaging has proven to be a stronger candidate 25–27. Functional MRI (fMRI) allows for the examination of brain functioning while a patient is performing a cognitive task. This technique is especially well suited to identifying changes in brain functioning before significant impairments can be detected on standard neuropsychological tests, and as such is sensitive to early identification of the disease processes 28,29. While fMRI requires participants to perform a task, resting-state fMRI (rs-fMRI) is capable of measuring the spontaneous fluctuations of brain activity without any task, hence it is less sensitive to individual cognitive abilities 30–32. One important feature of rs-fMRI is the ability to measure functional connectivity changes 33,34 with many recent studies have shown that functional connectivity changes are prevalent in AD 35–38.
Analysis of rs-fMRI data using graph theory measures is a powerful tool that enables characterization of the global, as well as local, characteristics of different brain areas 39–42. This method provides us with a way to comprehensively compare functional connectivity organization of the brain between patients and controls 43–45, and has been used in characterization of AD 46–48. This method has also been used in diagnosis classification of different stages of AD 49–51.
Since graph theory analysis of rs-fMRI data leads to a large number of parameters, it is essential to select an optimal subset of features that can lead to high discrimination accuracy 52,53. Feature selection is particularly complicated due to the non-linear nature of classification methods. For example, more parameters do not necessarily lead to better performance 54,55. Evolutionary algorithms (EA) are biologically-inspired algorithms that are extremely effective in optimization algorithms with large search spaces 56–60. EA has been used in characterization and diagnosis of AD 61–65.
In this study we devised a method that achieves higher accuracy in the classification of HC and EMCI participants compared to the past published research. We used MRI and rs-fMRI data of a group of healthy participants and those with EMCI. We applied graph theory to extract a collection of 1155 parameters. This data is then given to five different EA methods to select an optimum subset of parameters. These selected parameters are subsequently given to an artificial neural network to classify the data into two groups of HC and EMCI. We aimed at identifying the most suitable method of optimization based on accuracy and training time, as well as identifying the most informative parameters.
2 Methods
2.1 Participants
Data for 70 participants were extracted from the publicly available database of the Alzheimer’s disease neuroimaging initiative database (ADNI) (http://adni.loni.usc.edu) 66–68. Table 1 represents the details of the data. EMCI participants had no other neurodegenerative diseases except MCI. The EMCI participants were recruited with memory function approximately 1.0 SD below expected education adjusted norms 69. HC subjects had no history of cognitive impairment, head injury, major psychiatric disease, or stroke.
Demographics of the data for participants included in this study.
2.2 Proposed Method
Structural T1-MRI and rs-fMRI data was extracted from the ADNI database 68. The data is given to CONN toolbox 70 in MATLAB v2018 (MathWorks, California, US). CONN is a tool for preprocessing, processing, and analysis of functional connectivity data. Preprocessing consisted of reducing subject motion, image distortions, and magnetic field inhomogeneity effects and application of denoising methods for reduction of physiological effects and other sources of noise. The processing stage consisted of extraction of functional connectivity and graph theory measures. In this stage, through two pipelines, a collection of 1155 parameters are extracted 70,71. These parameters are then given to one of the dimension reduction methods (five EA and one statistical method) to select a subset of features. The selected features are finally given to an artificial neural network to classify the data into two categories of healthy control (HC) and EMCI. See Figure 1 for the summary of the procedure of the method.
Procedure of the proposed method. T1-MRI and resting-state fMRI (rs-fMRI) data of healthy participants (HC; n=36) and patients with early mild cognitive impairment (EMCI; n=34) are extracted from ADNI database 68. Preprocessing, parcellation of brain area (116 regions based on AAL) and extraction of the functional connectivity (49 network parameter), as well as the seven graph parameters are done using CONN toolbox 70. Subsequently the 1155 (116×7 + 49×7) extracted parameters are given to one of the optimization methods to select the best subset of parameters that lead to best classification method. Optimization methods consisted of five evolutionary algorithms (boxes with grey shading) and one statistical algorithm. The outputs of these methods are given to an artificial neural network (ANN) with two hidden layers to classify the data into HC and EMCI. AAL: automated anatomical atlas 75; GA: genetic algorithm; NSGA-II: nondominated sorting genetic algorithm II; ACO: ant colony optimization; SA: simulated annealing; PSO: particle swarm optimization; seven graph features: degree centrality, betweenness centrality, path length, clustering coefficient, local efficiency, cost and global efficiency.
2.3 Data acquisition and preprocessing
Brain structural T1-weighted MRI data with 256×256×170 voxels and 1×1×1 mm3 voxel size were extracted for all subjects. MRI data preprocessing steps consisted of non-uniformity correction, segmentation into grey matter, white matter and cerebrospinal fluid (CSF) and spatial normalization to MNI space.
Using an echo-planar imaging sequence on a 3T Philips MRI scanner, rs-fMRI data were obtained. Acquisition parameters were: 140 time points, repetition time (TR) = 3000 ms, echo time (TE) = 30 ms, flip angle = 80°, number of slices = 48, slice thickness= 3.3 mm, spatial resolution = 3×3×3 mm3 and in plane matrix = 64×64. fMRI images preprocessing steps consisted of motion correction, slice timing correction, spatial normalization to MNI space, low frequency filtering to keep only (0.01 – 0.1 Hz) fluctuations. T1-MRI and rs-fMRI data processing was done using CONN toolbox 70.
2.4 Functional Connectivity
Functional connectivity, also called “resting state” connectivity, is a measure for the temporal correlations among the blood-oxygen-level-dependent (BOLD) signal fluctuations in different brain areas 72–74. In this study, we obtained functional connectivity of region of interest (ROI)-to-ROI of brain areas according to Harvard-Oxford atlas 75. The functional connectivity matrix is the correlation, covariance, or the mutual information between the fMRI time series of every two brain regions, which is stored in an n × n matrix for each participant, where n is the number of brain regions obtained by atlas parcellation 74. To extract functional connectivity between different brain areas we used Pearson correlation coefficients formula as following 70,76:
where S is the BOLD time series at each voxel (for simplicity all-time series are considered central to zero means), R is the average BOLD time series within an ROI, r is the spatial map of Pearson correlation coefficients, and Z is the seed-based correlations (SBC) map of Fisher-transformed correlation coefficients for this ROI 77.
2.5 Graph Parameters
We used the graph theory technique to study topological features of functional connectivity graphs across multiple regions of the brain 49,78. Graph nodes represented brain regions and edges represented interregional resting-state functional connectivity. The functional connectivity matrix is employed for estimating common features of graphs including (1) degree centrality (the number of edges that connect a node to the rest of the network) (2) betweenness centrality (the proportion of shortest paths between all node pairs in the network that pass through a given index node), (3) average path length (the average distance from each node to any other node), (4) clustering coefficient (the proportion of ROIs that have connectivity with a particular ROI that also have connectivity with each other), (5) cost (the ratio of the existing number of edges to the number of all possible edges in the network), (6) local efficiency (the network ability in transmitting information at the local level), (7) global efficiency (the average inverse shortest path length in the network; this parameter is inversely related to the path length) 79.
2.6 Dimension Reduction Methods
We used five EA to select the most efficient set number of features. These algorithms are as follows:
Genetic algorithm (GA)
GA is one of the most advanced algorithms for feature selection 80. This algorithm is based on the mechanics of natural genetics and biological evolution for finding the optimum solution. It consists of five steps: selection of initial population, evaluation of fitness function, pseudo-random selection, crossover, and mutation 81. For further information refer to supplementary Methods section. Single point, double point, and uniform crossover methods are used to generate new members. In this study we used 0.3 and 0.1 as mutation percentage and mutation rate, respectively; 20 members per population, crossover percentage was 14 with 8 as selection pressure 63,82.
Nondominated sorting genetic algorithm II (NSGA-II)
NSGA is a method to solve multi-objective optimization problems to capture a number of solutions simultaneously 83. All the operators in GA are also used here. NSGA-II uses binary features to fill a mating poll. Nondomination and crowding distance are used to sort the new members. For further information refer to supplementary Methods section. In this study the mutation percentage and mutation rate were set to 0.4 and 0.1, respectively; population size was 25, and crossover percentage was 14%.
Ant colony optimization algorithm (ACO)
ACO is a metaheuristic optimization method based on the behavior of ants 84. This algorithm consists of four steps: initialization, creation of ant solutions (a set of ants build a solution to the problem being solved using pheromones values and other information), local search (improvement of the created solution by ants), and global pheromone update (update in pheromone variables based on search action followed by ants) 85. ACO requires a problem to be described as a graph: nodes represent features and edges indicate which features should be selected for the next generation. In features selection, the ACO tries to find the best solutions using prior information from previous iterations. The search for the optimal feature subset consists of an ant traveling through the graph with a minimum number of nodes required for satisfaction of stopping criterion 86. For further information refer to supplementary Methods section. We used 10, 0.05, 1, 1 and 1 for the number of ants, evaporation rate, initial weight, exponential weight, and heuristic weight, respectively.
Simulated annealing (SA)
SA is a stochastic search algorithm, which is particularly useful in large-scale linear regression models 87. In this algorithm, the new feature subset is selected entirely at random based on the current state. After an adequate number of iterations, a dataset can be created to quantify the difference in performance with and without each predictor 88,89. For further information refer to supplementary Methods section. We set initial temperature and temperature reduction rate with 10 and 0.99, respectively.
Particle swarm optimization (PSO)
PSO is a stochastic optimization method based on the behavior of swarming animals such as birds and fish. Each member finds optimal regions of the search space by coordinating with other members in the population. In this method, each possible solution is represented as a particle with a certain position and velocity moving through the search space 90–92. Particles move based on cognitive parameter (defining the degree of acceleration towards the particle’s individual local best position, and global parameter (defining the acceleration towards the global best position). The overall rate of change is defined by an inertia parameter. For further information refer to supplementary Methods section. We used 20 as the warm size, cognitive and social parameters were set to 1.5 and inertia as 0.72.
Statistical approach
To create a baseline to compare dimension reduction methods based on evolutionary algorithms, we also used the statistical approach to select the features based on the statistical difference between the two groups. We compared the 1155 parameters using two independent-sample t-test analyses. Subsequently we selected the parameters based on their sorted p values.
2.7 Classification Method
For classification of EMCI and HC we used a multi-layer perceptron artificial neural network (ANN) with two fully-connected hidden layers with 10 nodes each. Classification method was performed via a 10-fold cross-validation. We used Levenberg-Marquardt Back propagation (LMBP) algorithm for training 93–95 and mean square error as a measure of performance. The LMBP has three steps: (1) propagate the input forward through the network; (2) propagate the sensitivities backward through the network from the last layer to the first layer; and finally (3) update the weights and biases using Newton’s computational method 93. In the LMBP algorithm the performance index F(x) is formulated as:
where e is vector of network error, and x is the vector matrix of network weights and biases. The network weights are updated using the Hessian matrix and its gradient:
where J represent Jacobian matrix. The Hessian matrix H and its gradient G are calculated using:
where the Jacobian matrix is calculated by:
where am−1 is the output of the (m − 1)th layer of the network, and Sm is the sensitivity of F(x) to changes in the network input element in the mth layer and is calculated by:
where wm+1 represents the neuron weight at (m + 1)th layer, and n is the network input 93.
3 RESULTS
The preprocessing and processing of the data was successful. We extracted 1155 graph parameters per participant (see Supplementary Figures 1-11). This data was used for the data optimization step. Using the five EA optimization methods and the statistical method, we investigated the performance of the classification for different numbers of subset of parameters. Figure 2 shows the performance of these methods for 100 subsets of parameters with 1 to 100 parameters. These plots are created based on 200 repetitions of the EA algorithms. To investigate the performance of the algorithms with more repetitions, we ran the same algorithms with 500 repetitions. These simulations showed no major improvement of increased repetition (maximum 0.84% improvement; see Supplementary Figure 12).
Classification performance of the five evolutionary algorithm (EA) methods and the statistical method for parameter subsets with 1 to 100 elements. The light blue color shows the average of the five EV algorithms. The number on the top left-hand corner represents the difference between the relevant plot and the mean performance of the EA methods. The green plot subplot in each panel represents superiority of the relevant EA as compared to the statistical method for different 100 subsets. The percentage value above the subplot shows the mean superior performance for the 100 subsets compared to the statistical method. These plots show that the EA performed significantly better than the statistical method. GA: genetic algorithm; NSGA-II: nondominated sorting genetic algorithm II; ACO: ant colony optimization; SA: simulated annealing; PSO: particle swarm optimization.
A threshold of 90% was chosen as the desired performance accuracy. Statistical modeling performance was constantly less than this threshold. The five EA methods achieved this performance with varying number of parameters. Figure 3 shows the accuracy percentage and the optimization speed of the five EA methods.
Performance of the five evolutionary algorithms (EA) in terms of (a) percentage accuracy and (b) optimization speed. The values in the legend of panel (a) show the minimum number of parameters required to achieve minimum 90% accuracy. The values in the legend of panel (b) show the minimum optimization speed to achieve minimum 90% accuracy. GA: genetic algorithm; NSGA-II: nondominated sorting genetic algorithm II; ACO: ant colony optimization; SA: simulated annealing; PSO: particle swarm optimization.
To investigate whether increasing number of parameters would increase performance, we performed similar simulations with maximum 500 parameters in each subset. This analysis showed that the performance of the optimization methods plateaus without significant increase from 100 parameters (Figure 4). This figure shows that performance of the optimization methods was between 92.55-93.35% and 94.27-94.55% for filtered and absolute accuracy, respectively. These accuracy percentages are significantly higher than 81.97% and 87.72% for filtered and absolute accuracy in the statistical classification condition.
Performance of different optimization methods for increased number of parameters per subset. The light blue dots indicate the performance of algorithms for each subset of parameters. The dark blue curve shows the moving average of the samples with window of ±20 points (Filtered Data). The red curve shows the mean performance of the five evolutionary algorithms. GA: genetic algorithm; NSGA-II: nondominated sorting genetic algorithm II; ACO: ant colony optimization; SA: simulated annealing; PSO: particle swarm optimization.
To investigate the contribution of different parameters in the optimization of classification we looked at the distribution of parameters in the 100 subsets calculated above (Figure 5). GA and NSGA showed that the majority of the subsets consisted of repeated parameters: out of the 1155 parameters only about 200 of the parameters were selected in the 100 subsets. SA, ACO and PSO, on the other hand, showed a more diverse selection of parameters: almost all the parameters appeared in at least one of the 100 subsets.
Distribution of different parameters over the 100 subsets of parameters. (a) Percentage of presence of the 1155 parameters. In the Statistical method, which is not present in the plot, the first parameter was repeated in all the 100 subsets. Numbers in the legend show the percentage repetition of the most repeated parameter. (b) Cumulative number of unique parameters over the 100 subsets of parameters. This plot shows that GA and NSGA2 concentrated on a small number of parameters, while the SA, ACO and PSO selected a more diverse range of parameters in the optimization. Numbers in the legend show the number of utilized parameters in the final solution of the 100 subsets of parameters. GA: genetic algorithm; NSGA-II: nondominated sorting genetic algorithm II; ACO: ant colony optimization; SA: simulated annealing; PSO: particle swarm optimization.
4 Discussions
Using CONN toolbox we extracted 1155 graph parameters from rs-fMRI data. The optimization methods showed superior performance over statistical analysis (average 20.93% superiority). The performance of the EA algorithms did not differ greatly (range 92.55-93.35% and 94.27-94.55% for filtered and absolute accuracy, respectively) with PSO performing the best (mean 0.96% superior performance) and SA performing the worst (mean 1.07% inferior performance), (Figure 2). The minimum number of required parameters to guarantee at least 90% accuracy differed quite greatly across the methods (PSO and SA requiring 7 and 49 parameters, respectively). The processing time to achieve at least 90% accuracy also differed across the EA methods (SA and NSGA2 taking 5.1s and 22.4s per optimization), (Figure 3). Increased number of parameters per subset did not increase the performance accuracy of the methods greatly, (Figure 4).
Classification of data into AD and HC has been investigated extensively. Many methods have been developed using different modalities of biomarkers. Some of these studies achieved accuracies greater than 90% 96. Classification of earlier stages of AD, however, has been more challenging; only a handful of studies have achieved accuracy higher than 90% (Table 2). The majority of these studies implemented convolutional and deep neural networks that require extended training and testing durations with many input data. For example, Payan et al. (2015) applied convolutional neural networks (CNN) on a collection of 755 HC and 755 MCI and achieved accuracy of 92.1% 97. Similarly, Wang et al. (2019) applied deep neural networks to 209 HC and 384 MCI data and achieved accuracy of 98.4% 98 (see also 99–102). We applied our method to a group of only 70 participants and achieved an accuracy of 94.55%. To the best of our knowledge, between all the studies published to date, this accuracy level is the second highest accuracy after Wang et al (2019) 98 with 593 total number of participants.
Summary of the studies aiming at categorization of healthy (HC) and mild cognitive impairment (MCI) using different biomarkers and classification methods. Only best performance of each study is reported for each group of participants and classification method. Further details of the following studies are in Supplementary Table 1.
Summary of the studies aiming at categorization of healthy (HC), mild cognitive impairment (MCI) and Alzheimer’s disease (AD) using different biomarkers and classification methods based on Table 1.
Research has shown that having a combination of information from different modalities supports higher classification accuracies. For example, Forouzannezhad et al. (2018) showed that a combination of PET, MRI and neuropsychological test scores (NTS) can improve performance by more than 20% as compared to only PET or MRI 102. In another study, Kang et al. (2020) showed that a combination of diffusion tensor imaging (DTI) and MRI can improve accuracy by more than 20% as compared to DTI and MRI alone 132. Our analysis, while achieving superior accuracy compared to a majority of the prior methods, was based on one biomarker of MRI, which has a lower computational complexity than multi-modality data.
Interpretability of the selected features is one advantage of the application of evolutionary algorithms as the basis of the optimization algorithm. This is in contrast with algorithms based on CNN or deep neural networks (DNN) that are mostly considered as black boxes 134. Although research has shown some progress in better understanding the link between the features used by the system and the prediction itself in CNN and DNN, such methods remain difficult to verify 135,136. This has reduced trust in the internal functionality and reliability of such systems in clinical settings 137. Our suggested method clearly selects features based on activity of distinct brain areas, which are easy to interpret and understand 64,138. This can inform future research by bringing the focus to brain areas and the link between brain areas that are more affected by mild cognitive impairment.
We implemented five of the most common evolutionary algorithms. They showed similar overall optimization performance ranging between 92.55-93.35% and 94.27-94.55% for filtered and absolute accuracy, respectively. They, however, differed in optimization curve, optimization time and diversity of the selected features. PSO could guarantee a 90% accuracy with only 7 features. SA on the other hand required 49 features to guarantee a 90% accuracy. Although SA required more features to guarantee a 90% accuracy, it was the fastest optimization algorithm with only 5.1s for 49 features. NSGA2 on the other hand required 22.4s to guarantee a 90% accuracy. These show the diversity of the algorithms and their suitability in different applications requiring highest accuracy, least number of features or fastest optimization time 56,61,139.
One distinct characteristic of GA and NSGA-II was the more focused search amongst features as compared to the other methods. GA and NSGA-II selected 222 and 224 distinct features in the first 100 parameter sets, respectively, while the other methods covered almost the whole collection of features, covering more than 97.6%. Notably GA and NSGA-II showed “curse of dimensionality” (also known as “peaking phenomenon”) with optimal number of features around 50 parameters 140–143. Therefore, perhaps the features selected by GA and NSGA-II are more indicative of distinct characteristics of the differences between HC and EMCI.
In this study, we proposed a method for classification of the EMCI and HC groups using graph theory. These results highlight the potential application of graph analysis of functional connectivity and efficiency of evolutionary algorithm in combination with a simple perceptron ANN in the classification of images into HC and EMCI. We proposed a fully automatic procedure for predication of early stages of AD using rs-fMRI data features. This is of particular importance considering that MRI images of EMCI individuals cannot be easily identified by experts. Further development of such methods can prove to be a powerful tool in the early diagnosis of AD.
Conflict of Interest
The authors declare that they have no conflict of interest.
Authors Contribution
JZ and AHJ conceived the study. JZ extracted the data. JZ and AHJ analyzed the data. JZ and AHJ wrote the paper. JZ, AS and AHJ revised the manuscript. AS and AHJ supervised the project.
Supplementary Methods
1 Genetic algorithm (GA)
The procedure of GA consists of the following four steps 1:
Individual encoding: Each individual is encoded as binary vector of size P, where the entry bi = 1 states for the predictor pi that is defined for that individual, bi = 0 if the predictor pi is not included in that particular individual (i = 1,…, P).
Initial population: Given the binary representation of the individuals, the population is a binary matrix where its rows are the randomly selected individuals, and the columns are the available predictors. An initial population with a predefined number of individuals is generated with a random selection of 0 and 1 for each entry.
Fitness function: the fitness value of the individual in the population is calculated using predefined fitness function. Individual with the lowest prediction error and fewer predictors have been selected for next generation.
Genetic operators: applying genetic operators to create the next generation.
The genetic operators are, Selection (randomly selection of members based on their fitness value; fitter members are more likely to be chosen), Crossover (the new generation is created by exchanging elements between two selected parents from the previous step), Mutation (elements in a selected member is changed), and Stop Criteria (the criteria and indicate the end of the search) 1. In our study we used roulette wheel selection for selection of the possible valuable solutions to producing offsprings for the next generation.
2 Nondominated sorting genetic algorithm II (NSGA-II)
Nondomination and crowding distance are used to sort the new members. A specific number of individuals in the sorted population are transferred to the next generation. This conventional NSGA algorithm has a computational complexity of O(MN3), where M is the number of objectives and N is the population size. NSGA-II on the other hand has overall complexity O(MN2), which is significantly 2. After termination of the optimization process, nondominated solutions form the Pareto frontier. Each of the solutions on the Pareto frontier can be considered as an optimal strategy for a specific situation 3–5.
3 Ant colony optimization algorithm (ACO)
See Figure 1 for the procedure of the traverse of an ant placed at node a. This ant has a choice of which feature to add next to its path (dotted lines). It traverses through the graph to find a path that satisfies the stopping criterion (e.g., a suitably high classification accuracy has been achieved with this subset). In this example, the ant chooses next feature b based on a set of transition rules, then c and then d. Upon arrival at d, the current subset {a; b; c; d} is determined to satisfy the traversal stopping criterion. At termination of search, the algorithm outputs this feature subset as a candidate for data reduction 6.
A sample example of ant traveling through multiple features in ant colony optimization algorithm (ACO). Here feature subset of {a; b; c; d} is selected as a possible solution 6.
The probability of an ant at feature i choosing to travel to feature j at time t:
where n is the number of ants, φij is the heuristic desirability of choosing feature j when at feature i,
is the set of nodes next to node i, which have not yet been visited by the ant n. The α > 0 and β > 0 are two parameters that determine the relative importance of the pheromone value and heuristic information, respectively, and ϑij is the amount of virtual pheromone on edge (i, j). The pheromone on each edge is updated according to the following formula 6:
This is the case if the edge (i, j) has been traversed; Δϑij(t) is 0 otherwise. The value 0 ≤ ρ ≤ 1 is decay constant used to simulate the evaporation of the pheromone. The pheromone is updated according to both the measure of the “goodness” of the ant’s feature subset γ and the size of the subset itself. By this definition, all ants update the pheromone 6. Fn is the feature subset found by ant n.
4 Simulated annealing (SA)
SA utilizes a certain probability to accept a worse solution. The algorithm starts with a randomly generated solution; in each iteration, a neighbor solution to the best solution so far is generated according to a predefined neighborhood structure and evaluated using a fitness function. The improving move is accepted, whilst worse neighbors are accepted with a certain probability determined by the Boltzmann probability, P = e − θ/ T where θ is the difference between the fitness of the best solution and the generated neighbor. Moreover, T is a temperature, which periodically decreases during the search process according to a certain cooling schedule. First, the current temperature T is set to be a very large number 7,8.
5 Particle swarm optimization (PSO)
In a PSO with an N-dimensional search space, the particle position and velocity are formulated by:
where Vi and xi refer to the velocity and position of the particle i, respectively, and j, ranging from 1 to N (total number of features). cp is the cognitive parameter, defining the degree of acceleration towards the particle’s individual local best position pij. cg is a social parameter, defining the acceleration towards the global best position pgj. w is an inertia parameter, regulating the overall rate of change. The stochastic nature of the velocity equation is represented by rp and rg, which are numbers in the range [0, 1]. To maintain coherence in the swarm, the maximum velocity is regulated by a parameter vmax. In standard PSO implementations, typically vmax = |xmax − xmin|.
A sample collection of networks and regions of interests (ROI) connectivity matrix using rs-fMRI data. The colors indicate t-value for one-sample t-test statistics.
Functional connectivity for brain areas with statistically significant correlation with other regions of interest (ROI).
All of the graph parameters on one view.
Graph parameter – average path length (the average distance from each node to any other node)
Graph parameter – betweenness centrality (the proportion of shortest paths between all node pairs in the network that pass through a given index node)
Graph parameter – clustering coefficient (the proportion of ROIs that have connectivity with a particular ROI that also have connectivity with each other)
Graph parameter – cost (the ratio of the existing number of edges to the number of all possible edges in the network)
Graph parameter – degree centrality (the number of edges that connect a node to the rest of the network)
Graph parameter – local efficiency (the network ability in transmitting information at the local level)
Graph parameter – global efficiency (the average inverse shortest path length in the network; this parameter is inversely related to the path length)
Comparison of classification performance for 200 repetitions (light blue) and 500 repetitions (dark blue) for different optimization algorithms per parameter set. The subplots show the difference between 200 and 500 repetitions, showing small superior performance for 500 repetitions. This is an indication that the algorithms converted within the first 200 repetitions.
Acknowledgements
The authors would like to thank Oliver Herdson for his comments and proofreading the manuscript.
Footnotes
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.
- 7.↵
- 8.↵
- 9.
- 10.
- 11.↵
- 12.↵
- 13.
- 14.↵
- 15.↵
- 16.
- 17.↵
- 18.↵
- 19.
- 20.↵
- 21.↵
- 22.
- 23.
- 24.↵
- 25.↵
- 26.
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.
- 37.
- 38.↵
- 39.↵
- 40.
- 41.
- 42.↵
- 43.↵
- 44.
- 45.↵
- 46.↵
- 47.
- 48.↵
- 49.↵
- 50.
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.
- 58.
- 59.
- 60.↵
- 61.↵
- 62.
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.
- 92.↵
- 93.↵
- 94.
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.
- 101.
- 102.↵
- 103.
- 104.
- 105.
- 106.
- 107.
- 108.
- 109.
- 110.
- 111.
- 112.
- 113.
- 114.
- 115.
- 116.
- 117.
- 118.
- 119.
- 120.
- 121.
- 122.
- 123.
- 124.
- 125.
- 126.
- 127.
- 128.
- 129.
- 130.
- 131.
- 132.↵
- 133.
- 134.↵
- 135.↵
- 136.↵
- 137.↵
- 138.↵
- 139.↵
- 140.↵
- 141.
- 142.
- 143.↵
References
References
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
- 30.
- 31.
- 32.
- 33.
- 34.
- 35.
- 36.