Eigenvector-based community detection for identifying information hubs in neuronal networks

Eigenvectors of networked systems are known to reveal central, well-connected, network vertices. Here we expand upon the known applications of eigenvectors to define well-connected communities where each is associated with a prominent vertex. This form of community detection provides an analytical approach for analysing the dynamics of information flow in a network. When applied to the neuronal network of the nematode Caenorhabditis elegans, known circuitry can be identified as separate eigenvector-based communities. For the macaque’s neuronal network, community detection can expose the hippocampus as an information hub; this result contradicts current thinking that the analysis of static graphs cannot reveal such insights. The application of community detection on a large scale human connectome (~1.8 million vertices) reveals the most prominent information carrying pathways present during a magnetic resonance imaging scan. We demonstrate that these pathways can act as an effective unique identifier for a subject’s brain by assessing the number of matching pathways present in any two connectomes. Author summary The dynamic response of a network to stimulus can be understood by investigating that system’s eigenvectors. The eigenvectors highlight the most prominent nodes; those that are either a major source or destination for information in the network. Moreover by defining a coordinate system based on multiple eigenvectors, the most prominent communities can be detected with the most prominent node detected alongside those in the community that funnel information towards it. These methods are applied to a variety of brain networks to highlight the circuitry present in a flatworm (Caenorhabditis elegans), the macaque and human subjects. Static graphs representing the connectomes are analysed to provide insights that were previously believed to only be detectable by numerically modelling information flow. Finally, we discovered that brain networks created for human subjects at different times can be identified as belonging to the same subject by investigating the similarity of the prominent communities.

Introduction 1 (MRI) scans considered herein achieve a 1 mm resolution for in vivo human subjects [2] 8 but 0.1 mm has been achieved with in vivo diffusion MRI data [3]. Improving scan 9 resolutions will enable higher fidelity connectomes and in turn require analytical tools 10 that can cope with the increase in scale. These detailed scans can be converted into 11 graphs through pipelines that produce consistent connectomes allowing studies and 12 comparisons [4]. Connectome analysis has often been constrained to small graphs 13 (< 1000 vertices) with the results of these studies compared with existing intuitions and 14 knowledge gained through experimentation [5], [6]. These previous studies on small 15 connectomes have identified influential regions by performing numerical flow simulations 16 of information travelling throughout the brain, but such an approach would likely be 17 intractable on large graphs containing millions of nodes. 18 The application of graph theory on a networked system produces an abstract 19 representation, i.e. a graph comprising of edges and vertices, that can be analysed to 20 better understand the movement of information. It has been argued that graph theory 21 has some blindspots when considering the dynamics of information flow, see [6], with 22 centrality measures unable to achieve the same insights as numerical methods. 23 Eigenvector centrality, see [7], is a metric that captures important information on a 24 graph's dynamics, in particular it highlights the vertices through which information 25 most frequently passes with this capability famously underpinning Google's PageRank 26 development [8]. 27 Spectral analysis, and eigenvectors in particular, are a cornerstone of the 28 developments herein with eigenvector centrality adapted and expanded upon. This 29 expansion is based on incorporating multiple eigenvectors to produce a dynamics-based 30 method for community detection. A common approach for community detection is to 31 focus on the static topology of the graph, i.e. the distribution of edges in a graph. One 32 such example is Leicht-Newman community detection for directed graph [9] that 33 compares the density of edges with that of a graph where edges are distributed 34 randomly to determine whether clustering and, hence, community division is present. 35 In contrast to Leicht-Newman's approach, dynamics-based community detection can 36 reveal communities of information flow that form in the presence of stimulus. These 37 communities are likely to be influenced by clustering but they are also affected by the The Caenorhabditis elegans is a non-parasitic nematode that is transparent and 45 unsegmented with a long cylindrical body shape that grows to about 1 mm in length. A 46 wiring diagram of the C. elegans nervous system was updated by Varshney et al. to 47 contain 279 somatic neurons [5]. The full wiring diagram includes chemical synapses, Communities of Dynamic Response (CDR) detailed in Algorithm 1. The circuits or, as 58 they will be referred to from hereon, communities identified by Varshney et al. [5] are 59 clearly visible in Fig. 1 and labelled as members of the positive v L3 community (orange) 60 or the negative v L3 community (yellow).

61
The one dimensional approach of Varshney et al. is only effective in certain cases as 62 it can only ever identify two communities. For example, v L6 presented in Fig. 2 could be employed to separate the vertices into two groups, but by employing CDR with 64 three dimensions (v L5 , v L6 and v L7 ), shown in Fig. 2 (a), a more nuanced picture 65 emerges where multiple communities are present. The community found by [5] from v L3 66 is marked with yellow vertices in Fig 1 and contains ASIL, AIAL, ASIR, AWAL, AIAR, 67 AWAR & ADFR. This same community is not identifiable using v L6 in isolation but is 68 depicted in light blue vertices in Fig. 2, which supports the validity of the community 69 designations detailed by the CDR method. The CDR relies on eigenvectors to detect communities in a network. These eigenvectors 72 capture the dynamics of the system and, hence, the CDR can also be exploited to 73 determine prominent vertices in a network.  [14][15][16][17]. Stating that these "analyses of anatomical and functional whole-brain 79 networks have largely failed to demonstrate the topological centrality of the hippocampus." 80 The conclusion of that study was "the functional capacity of a given region or 81 subnetwork cannot be fully discerned by only analyzing the static structural connectivity 82 of the brain". Functional capacity is understood to be an assessment of how effectively a 83 region can carry out a given function. In the case of the hippocampus, this would be an 84 assessment of its ability to receive information from across the whole network.

85
The CoCoMac database [18] supplied the connectome used by Mišić et al. [6], which 86 contained 242 vertices representing neuronal areas with each edge carrying equal 87 weighting. The CDR will demonstrate that, contrary to the claims of [6], the influence 88 of the hippocampus (CA1) can be discerned by analysing the structural connectivity of 89 the brain.

90
TFM is most prominent vertex in the macaque connectome, as shown in Fig. 3, and 91 it belongs to a community of two vertices (TFM and CA1) that are highlighted in pink. 92 CA1 is therefore part of the most prominent pathway for information in this 93 connectome. CA1's influence is further enhanced by being connected to TFL, which is 94 the second most prominent vertex according to v L1 . CA1 receives one of the two 95 outgoing connections from TFL, whilst also being the only receptor of information from 96 TFM. The reason why TFM and TFL have large values of v L1 is that they are 97 bottlenecks for information in the network with information from across the graph 98 arriving at these two nodes and only having three paths to choose from, two of which  The CoCoMac network employs a uniform weighting for all edges. A macaque's 109 brain will have variable weights for the edges between different neuronal areas.

110
Therefore, it is possible that if edge weights were known for this network, and the edge 111 between TFM and CA1 had a large weighting, then the hippocampus (CA1) could be 112 the largest element of v L1 . But given the lack of edge weighting information present, an 113 intuition is applied for this network that the highest v L1 vertices are the main 114 information collators (bottlenecks). These collator vertices tend to have relatively low 115 outdegree and pass information onto an influential region. This intuition can be tested 116 by defining the collation vertices as prominent vertices (PV) and vertices they pass 117 information on to as Outgoing Connection vertices (OCN). Table 1

details the PVs and 118
OCNs for a number of prominent vertices from Fig 3. A clear example of an OCN as a 119 known influential region is the M2-HL vertex as the PV that is connected to the M1-HL 120 as the OCN. In this case, the supplementary motor cortex (M2-HL) is acting as the 121 information collator that then provides information to the primary motor cortex 122 (M1-HL), which is the most influential region for motor control.   The distribution of high weight edges in scan 1 and scan 2, for subject 113, are result of scan 2 producing few relatively high weighted edges (> 1400) and a majority of 186 low weight edges (< 600), whereas scan 1 has a more even distribution with the highest 187 weighted edge only ∼ 1400 but with many edges above 1000. This uneven distribution 188 translates into the eigenvector entries where scan 2 contains only a few high entry 189 values whilst scan 1 has a more even distribution. The reduction in pathway size is, 190 therefore, due to fewer vertices in scan 2 achieving the threshold eigenvector entry value 191 to be included in the pathway, see the Materials and Methods Section.  The eigenvector pathways were used in a similar manner to assess whether pathways 202 matched, using the percentage match and PV distance categories shown in Table 3. 203 Given the criteria in the Materials and Methods Section the mean number of matching 204 pathways is presented in Fig. 6 (b). The Frobenius Distance can be seen in some cases 205 to highlight a scan-rescan match, but the pathway matching approach is successful in 206 every case with the matches clearly distinguished from non-matching pairs for both row 207 and column comparisons.

208
The subject with the lowest number of matching pathways, as detailed in Fig. 6 (b), 209 is subject 849. Subject 849's best match is still the scan-rescan pair, but it also 210 produced a lower mean number of matching pathways than some comparisons that were 211 October 24, 2018 10/18 not scan-rescan pairs, such as the comparisons between subject 142 and subject 492.

212
Investigating the pathways of subject 849 reveals that the shape and position of the 213 pathways appear to be similar, see Fig. 7, which would have produced a high number of 214 matching pathways. But the pathways are, with one exception, slightly offset from each 215 other despite mostly matching in terms of shape, direction and length. It is this offset 216 that reduces the mean number of matching pathways to 1.4 when visually there appears 217 to be at least 4 matches, possibly 5. There are always errors in the images produced 218 from MRI scans, even when using the same equipment and procedure, with small errors 219 occurring because of slight changes in image orientation and magnetic field 220 instability [20]. It is possible that these offsets are a result of such errors.

221
(a) (b) (c) Fig 7. Human brain of subject 849 represented by x,y,z outlines from a brain surface model. Prominent vertices from scan 1 are displayed in green with scan 2 marked in blue. The most prominent vertex for each eigenvector is marked with a circle or cross.
View from the side with eigenvectors labelled.

222
We found that incorporating multiple eigenvectors in the detection of communities 223 produced a more nuanced picture of a system's circuitry than had previously been 224 achieved by using a single eigenvector. The communities detected are, due to their scan and rescan, this method could assess if there were changes to the pathways used to 239 accomplish the task every time the task was repeated.

240
The results of graph based analysis, and therefore the eigenvector method presented, 241 is limited by the network that has been constructed. In particular, for human brains, 242 undirected networks are the most prevalent but approaches do exist for the creation of 243 directed connectomes from MRI scans. Without a directed network the insights 244 available are limited as the true dynamics of the brain are concealed with key 245 information lost by ignoring the imbalance in the outdegree to indegree ratio of vertices. 246 It is known that the brain employs directed connections and without knowledge of these 247 it is difficult to uncover if a prominent vertex is a sink or source of information in the 248 graph. Indeed the Laplacian matrix, used for the analysis of the directed graphs, 249 emphasises this imbalance further as each diagonal element is equal to the sum of the 250 non-diagonal elements in its row i.e. the indegree of a vertex. But, as was displayed 251 previously with the investigation of C. elegans and their electrical junction network (see 252 the C. Elegans Connectome Section), insights can still be gained when using an 253 undirected graph. As noted in the previous section, another source of uncertainty comes 254 from the scan that generated the connectome where slight errors can affect the results, 255 in particular the location of prominent pathways in the brain.

256
It is possible to conject as to how such connectome analysis could be used. Each 257 pathway/community is associated with an eigenvector, the pathways are therefore 258 ranked by their associated eigenvalue with the first eigenvector/eigenvalue associated 259 with the largest dynamic response of the system. This provides a metric for ranking the 260 prominence of the pathways in the brain. It is possible that such a capability could be 261 applied as a quantitive assessment of, for example, a stroke victim's progress in 262 retraining neural pathways to regain speech. The prominence of the relevant neural 263 pathways could be monitored to observe their growing prominence as the pathways are 264 retrained. This could provide a metric with which to measure progress and provide 265 further insight into the process of brain plasticity. This analysis could even form the 266 basis of the treatment itself, where it has been observed that improved understanding of 267 pathological circuitry has already guided deep brain stimulation used in the treatment 268 of Parkinson's disease, depression, and obsessive compulsive disorder [21]. There is also 269 potential for employing noninvasive brain stimulation techniques, such as transcranial 270 magnetic stimulation, as a significant part of the challenge is in identifying specific 271 neural systems that should be targeted for intervention [21].

273
A graph is defined as G = (V, E), where there is a set of V vertices and E edges, which 274 are unordered pairs of elements of V for an undirected graph and ordered pairs for a 275 directed graph. The degree of a vertex is the number of edges connected to that vertex. 276 In the case of a directed graph, there is an indegree and outdegree; indegree is the 277 number of connections entering a vertex and outdegree is the number of connections 278 exiting a vertex.

279
The adjacency matrix, A, is a square n × n matrix when representing a graph of n 280 vertices. This matrix captures the network's connections where a ij > 0 (a ij is the ij th  The eigenvectors of both the Laplacian and adjacency matrices are considered in this 292 work. The dominant eigenvalue, λ 1 , for the adjacency matrix is the largest eigenvalue in 293 magnitude while for the Laplacian matrix it is the smallest eigenvalue (λ 1 = 0) [22].

294
The eigenvector associated with λ 1 is referred to as the first eigenvector, specifically the 295 first left eigenvector (FLE) when considering the Laplacian matrix of a directed graph. 296 The direction of an edge in this work defines the direction of travel for information. 297 Therefore, if an edge is going from vertex i to vertex j, information is travelling from i 298 to j. This is important as it affects the interpretation of the FLE. For example, if a 299 packet of information departs every vertex in the network then the largest elements of 300 the FLE are the vertices that information is funnelled towards. If the direction of 301 information travel along an edge was reversed then the FLE would identify the vertices 302 that are most effective sources for spreading information quickly across the whole 303 network. This knowledge has been used previously to allocate resources that drive a 304 network to a fast convergence to consensus [23], [24]. 305 Considering the eigenvectors that proceed the first eigenvector, they can be 306 understood to highlight vertices that collate information from across the whole network. 307 But each proceeding eigenvector represents a slower mode of response for the system, 308 therefore the vertices highlighted receive the information more slowly than the vertices 309 that were prominent according to the first eigenvector or any associated with a smaller 310 eigenvalue of the Laplacian matrix than the eigenvalue being considered. connection to a vertex that is at a greater distance from the origin. This can be seen in 319 Fig. 8 (a) & (b) where communities are comprised of vertices that are each associated

323
When considering the dynamics of the whole network, the FLE determines the most 324 prominent vertices where, as was explained in the previous section, it represents the 325 fastest response of the system. These prominent vertices in Fig. 8 are marked with a 326 black outline and represent the nodes with the largest, in this example, v L1 value that 327 are not connected to a node with a greater v L1 value.

328
The toy example in Fig 8 employs a k-Nearest Neighbour topology, whereby vertices 329 are randomly distributed on a plane before connecting to their five nearest neighbours. 330 The k-NNR topology is used since it is a good topology for demonstrating community 331 structure as the nearest neighbour rule encourages communities to form.

Algorithm 1 Detecting communities of dynamic response procedure Community Detection
Find the first three, normalised, eigenvectors of the Laplacian, L ∈ IR n×n , (associated with the three smallest eigenvalues in magnitude) v L1 , v L2 and v L3 . if q > 1 then for j = 1 to q do k = C c(j) (1) P oi→k (j) is the scalar projection of e oi onto e k . end for Remove i from C j ∀ j where P oi→k (j) = max(P oi→k ) end if end for A list of communities have been created where C p contains a list of vertices belonging to community p. end procedure

333
This section refers to the information flow model used by Mišić et al. [6] to investigate 334 the CoCoMac connectome. The model was setup as a discrete-event queueing network 335 where signals were continually generated, at randomly-selected grey matter vertices in 336 the network, and assigned randomly-selected destination vertices. The signals then 337 travelled through the network via white matter projections (edges). Grey matter 338 vertices were modelled as servers with a finite buffer capacity, such that if a signal unit 339 arrives at an occupied vertex, a queue will form. Upon reaching its destination vertex, 340 the signal unit was removed from the network. Mišić et al. used  For the analysis of human connectomes, a prominent pathway was detected for each of 348 the ten first eigenvectors. Each pathway was selected as the most prominent community, 349 from the communities generated by the CDR algorithm (see Algorithm 1). Prominence 350 was determined based on which community contained the vertex with the largest 351 eigenvector entry in magnitude; referred to as the prominent vertex (PV).

352
For a given eigenvector, v, only community nodes with min (v) > 0.01 were included 353 in the pathway. This ensured that the pathways only included the most prominent 354 members of each community, which produced clearer results when performing a pathway 355 comparisons.

356
The metric developed considers the shortest distance from all the vertices of one 357 path to the nearest vertex that belonged to the other path. Vertices were considered 358 overlapping if they were from the same voxel or they were in an adjacent voxel (i.e. 359 < 1.42 mm distance away). The percentage of vertices within this overlapping distance 360 was then calculated. To determine if a pathway matched a threshold percentage match 361 had to be achieved. For the results in this paper a mean value was taken for a range of 362 threshold values. Therefore the mean number of overlapping pathways was checked for a 363 range of percentage match thresholds from 50% to 90%, checked at 10% increments with 364 a requirement that the PV distance be less than 15 mm. PV distance is a comparison of 365 the point-to-point distance between the PV's belonging to the matching pathways.

366
A pathway from one scan might be the closest match to multiple pathways from 367 another scan, in this case only the closest match would count with the other matches 368 ignored. For the example shown in Table 3, an eigenvector pathway has 7 matches with 369 both 1, 3 and 7 from scan 1, therefore two of these would be ignored for a 50% 370 matching criteria resulting in 5 matching pathways in total. where a 1 and a 2 are elements of the adjacency matrix for scan 1 and scan 2 respectively. 375