Multifactorial Dynamics of White Matter Connectivity During Adolescence

Studying developmental changes in white matter connectivity is critical for understanding neurobiological substrates of cognition, learning, and neuropsychiatric disorders. This becomes especially important during adolescence when a rapid expansion of the behavioral repertoire occurs. Several factors such as brain geometry, genetic expression profiles, and higher level architectural specifications such as the presence of segregated modules have been associated with the observed organization of white matter connections. However, we lack understanding of the extent to which such factors jointly describe the brain network organization, nor have insights into how their contribution changes developmentally. We constructed a multifactorial model of white matter connectivity using Bayesian network analysis and tested it with diffusion imaging data from a large community sample. We investigated contributions of multiple factors in explaining observed connectivity, including architectural specifications, which promote a modular yet integrative organization, and brain’s geometric and genetic features. Our results demonstrated that the initially dominant geometric and genetic factors become less influential with age, whereas the effect of architectural specifications increases. The identified structural modules are associated with well-known functional systems, and the level of association increases with age. This integrative analysis provides a computational characterization of the normative evolution of structural connectivity during adolescence.


Introduction
The human brain, as depicted by a network of interconnected regions, acquires a distinctive structural and functional architecture over the course of development [1], [2]. This distinctive architecture has a hierarchical modular organization [3], with special hub regions that are theorized to facilitate the integration of modules [4], [5]. It is generally postulated that the structural organization of the brain provides a basis for the emergent functional systems [6] and thereby, of cognitive capacities [7]. Therefore, studying developmental changes in the white matter connectivity and resulting structural organization is critical for understanding neurobiological substrates of cognition [8], [9], learning [10], and psychiatric conditions [11]. This becomes especially important during adolescence and young adulthood when the human brain undergoes a protracted course of remodeling to support the rapid expansion of its behavioral repertoire [12].
Various empirical studies have revealed several factors that are correlated with the observed white matter connectivity, including regional genetic expression profiles, brain geometry, and wiring costs of fibers [13]- [16]. Despite the significant correlation of these factors with the observed structural connectivity, they do not fully explain the distinctive organization of the brain network [14], [17]. For this reason, in addition to such intrinsic factors, several higher level factors, corresponding to certain architectural specifications (e.g. presence of segregated modules), have been also considered to explain distinctive features of the brain network organization [18]. All these findings highlight the need for a robust multifactorial analysis of the white matter connectivity. This multifactorial analysis is crucial to understand the extent to which multiple factors, such as brain geometry and architectural specifications, jointly describe the brain network organization. Such a multifactorial analysis can also reveal the temporal changes in the contribution of these factors during development, a topic that has been only sparsely studied.
In this work, we have developed a multifactorial generative model of structural connectivity using Bayesian network analysis [19], [20]. This approach facilitates elucidating the contribution of different factors in explaining the empirical connectivity in a data-driven fashion, without necessarily assuming any causal theories regarding the formation of the network. One important advantage of generative models is that developmental effects can be studied by analyzing the evolution of model parameters across ages. Using our generative model and diffusion imaging data of a large community sample of youth, collected as a part of The Philadelphia Neurodevelopmental Cohort (PNC) dataset [11], we studied the contribution of certain architectural specifications, which promote a modular yet integrative architecture, and brain's geometric and genetic features in explaining the observed structural connectivity. First, we investigated how much of the observed structural connectivity can be described by a base model of connectivity that includes only geometric and genetic factors. Then, we demonstrated how the inclusion of the architectural specifications increases the accuracy of the generative model in explaining the connectivity. Finally, we quantified the developmental effects in our multifactorial model in terms of the relative contributions of the factors across ages.
Our results demonstrated that the initially dominant geometric and genetic factors become less influential with age, whereas the effect of higher level architectural specifications that promote a modular yet integrative organization increases. The identified structural modules of the brain are associated with well-known functional systems, and the level of association increases with age as well. This integrative analysis provides a computational characterization of the normative changes in structural connectivity in the human brain during the course of development.

Participants
Institutional Review Board approval was obtained from the University of Pennsylvania and the Children's Hospital of Philadelphia. We used a large sample of healthy young individuals (ages between 8-22 years) from the Philadelphia Neurodevelopmental Cohort (PNC) dataset [11], each assessed using diffusion MRI. Participants were excluded due to poor imaging data quality, or a history that suggested potential abnormalities of brain development such as medical problems that might affect brain function, inpatient psychiatric hospitalization, or current use of psychotropic medication. The final study sample included 818 participants, with 361 males (mean age: 15.10 years, std: 3.44) and 457 females (mean age: 15.25 years, std: 3.33).

Image Acquisition and Brain Network Construction
Diffusion weighted magnetic resonance imaging (dMRI) scans were acquired for each individual. Quality assurance of the data was conducted to detect motion and scanner related artifacts and outliers, using the procedure described in [21]. The details of image acquisition and network construction are given in Supplementary Note S1 and illustrated in Fig. 1. Our pipeline included brain extraction using FSL [22], DWI de-noising using Slicer [23], tensor fitting using multivariate linear fitting [24], cortical/subcortical segmentation using Freesurfer [25], and probabilistic tractography (seeded from white matter / gray matter boundary) using the probtrackx utility of FSL [22].
The final brain network of a participant had 86 nodes corresponding to the gray matter regions of interest (ROIs), including 34 cortical regions of the Desikan atlas [26], 8 subcortical regions, and cerebellum in the left and right hemispheres [25]. The complete list of regions is given in Supplemental File 1. The edges of the network had weights corresponding to the normalized number of streamlines between regions, as generated using probabilistic tractography. The edge weight between two regions i and j was calculated as o ij = (w ij + w ji )/2, where w ij = s ij /v i , s ij is the streamline count between the regions, and v i is the volume (the number of voxels used for seeding) of the region i. The normalization by the region volume accounts for the differences in the region size. The final value was rounded to have integer valued edge weights.
In order to prune possibly spurious connections, we used consistency based thresholding [27], keeping only the most consistent edges across all participants. Due to broadly ranging estimates of actual connection density in the literature [5], [27], [28], we repeated our experiments in a density range of 10-30%, with different thresholds. Here, we report results for 15% density. Very consistent results were achieved for other density levels (results are provided in Supplementary Notes S8).

Generative Model for Brain Connectivity
A schematic illustration of the generative model is shown in Fig. 2. Given a set of prior expectations ( ) over the streamline counts between N regions, we model the observed streamline counts ( ) between the regions by a Dirichlet-Multinomial distribution, as proposed before [29], [30]. where , e i = ∑ e ij N j=1 , t ij = o ij + e ij , and t i = ∑ t ij N j=1 . The variables o ij and e ij represent the observed number of streamlines (o ij ) between regions i and j, and our prior expectations (e ij ) for the same.
In this study, we propose a novel base model of streamline connectivity by postulating specific expectations (e ij ) for the number of streamlines between regions, relying on their geometric and genetic features. We considered different mathematical expressions for modeling e ij and selected the following one using Bayesian model comparison [31] (see Supplementary Note S2 for details).
In the base model, c ij corresponds to the Euclidean closeness (inverse of distance) of region centers, and the second factor g ij is the genetic similarity between regions. The unknown exponents, α and β, determine the contribution of each factor.
In our experiments, the first factor c ij was different for each participant, calculated from their T1 images. The second factor g ij , however, was fixed for all participants, calculated using Allen Institute for Brain Science (AIBS) microarray expression dataset [32], as explained below. In order to facilitate a comparison between the contribution of different factors, both c ij and g ij were normalized to have the same maximum value with the observed number of streamlines (o ij ), for each participant individually. Then, the resulting matrix ≡ {e ij } was normalized so that its summation is equal to the summation of observations ≡ {o ij }. This guarantees that the maximum likelihood solution of Eq. 1 is attained when e ij = o ij . Although the Dirichlet-Multinomial distribution is a natural choice for modeling relative strength of connectivity among regions and has been commonly used in the literature for similar purposes [20], [29], [30], the plausibility of our parametric model needs to be established prior to subsequent experiments. For this reason, we demonstrated that our generative model performs very similar to nonparametric models that have been developed for specialized purposes (see Supplementary Note S3 for details).

Genome-wide Expression Profiles
AIBS microarray expression dataset [32] consists of 3702 brain tissue samples from brains of six donors (5 males, 1 female, ages:24-57 years) [33]. Each sample is assessed by 58,692 probes. The expression levels were normalized across brains during comprehensive pre-processing steps [34], [35]. Among all probes available in the microarrays, we only considered the 17,348 uniquely annotated transcripts that are included in [36]. In order to select probes corresponding to the genes that are most relevant to brain function, we used the highest 10% of genes ranked by their differential stability (DS) indices [36].
Please refer to Supplementary Note S4 for details on our pipeline to calculate the genetic similarity between regions (g ij ). Our pipeline included steps to find correspondence between AIBS tissue samples and 86 regions of our atlas, to exclude genes with expression values that were not significantly different than the background expression level, to decrease the dimensionality using linear discriminant analysis (LDA), to calculate Pearson correlation between regional genome-wide expression profiles, and to adjust the genetic similarity values for distance between regions.

Detection of Modules and Hubs
Architectural specifications, corresponding to the presence of a modular structure and integrative hub regions, were imposed by modifying e ij as e ̅ ij = e ij + λz ij e ij , where the structure of the variable z ij encodes the architectural specifications. When assuming only modules but not hubs, z ij is 1 if two regions are in the same module and -1 otherwise, and e ij is calculated as in Eq. 2. Each module defines a cluster of regions that are densely connected to each other while only being sparsely connected to the regions outside their modules [37]. This model increases the number of expected streamlines between any two regions by λe ij , if regions are in the same module, while decreasing the same if they are in different modules. When there are both modules and hubs, z ij is 1 if two regions are in the same module, or at least one of them is a hub region and they are in the same hemisphere, and -1 otherwise. In this case, the model expects hub regions to make more connections with all other regions. The same hemisphere constraint was used to relax this high demanding expectation.

Inference of Model Parameters
The unknown parameters α, β, and λ in Eq. 2,3 and the variable z ij in Eq. 3 were estimated by maximizing the model likelihood in Eq. 1. This was performed for each participant individually. We estimated the unknown parameters for the three scenarios (base model, modular model, and modular yet integrated model) independently, in order to compare between the scenarios. Please refer to Supplementary Note S5 for implementation details.

Proportion of Explained Connectivity
can be considered as a measure of the proportion of the observed connectivity that is explained by the generative model. Note that this measure has an upper bound of 1, and is comparable across participants. The contribution of different factors (e.g., distance or genetic similarity), can thus be quantified by calculating ρ values for models including single factors, or by comparing ρ values with and without using a specific factor in the model.

Statistical Significance of Factors
In order to calculate the statistical significance of the contribution of different factors (e.g., closeness or genetic similarity), we used the Wilcoxon signed-rank test to compare the Akaike Information Criterion (AIC) [31] values of two models with and without the selected factor. The null hypothesis assumes that difference between the two models follows a symmetric distribution around zero. Additionally, in order to assess the significance of identified modules, compared to randomly defined ones, we used permutation testing. The null hypothesis assumes that an increase in ρ value (Eq. 4), at least as high as the actual one, can be achieved by random assignment of regions into similarly sized modules. Details are provided in Supplementary Note S6.

Consensus Modules
The generative model of streamline counts (Eq. 1) was run individually for each participant. Therefore, the modular organization (that is, the number of modules and assignment of regions into modules) shows differences between participants. In order to define a consensus of modular organization, we used the mean brain network (o ij ) and the mean closeness map (c ij ) of all participants, and estimated the unknown parameters α, β, λ, i , again by maximizing the model likelihood in Eq. 1. Note that genetic factor g ij was already fixed for all participants.

Functional Systems
In order to study correspondence between identified structural modules and functional systems of the brain network, we assigned the 86 brain regions to 10 functional systems [38], namely auditory, cinguloopercular, default mode, dorsal attention, frontoparietal, motor, subcortical, ventral attention, visual, and others, using the definitions from Gu et al. 2014 [39] (see Supplementary File 1 for the complete list). This resulted in predefined functional systems that are common for all participants.
In order to quantify the similarity between the alliance of regions determined by consensus structural modules and the functional system, we used Normalized Mutual Information (NMI) [40]. NMI quantifies the agreement between structural modules and functional systems in terms of the assignment of regions to same/different modules/systems. It has values between 0 and 1, where 1 indicates perfect agreement. Statistical significance of correspondence between structural modules and functional systems was assessed using permutation testing. Please refer to Supplementary Note S7 for details.

Plausibility of the Generative Model
We compared the networks simulated by our generative model with those that were generated by a nonparametric model [14]. Our model was able to better represent the observed streamlines counts as compared to the nonparametric model, while both models generated networks with similar topological features. Results are shown in Supplementary Fig. S2. We also compared the modules identified by our generative model to those that were identified by Louvain method for community detection [41]. We first detected modules using Louvain method and then detected the same number of modules using our model. There was a substantial agreement between identified modules. Results are shown in Supplementary Fig. S3.

Base Model of Streamline Connections
We first defined a base model of connectivity that includes only geometric and genetic factors. Illustration of the two factors (Fig. 3a), namely physical closeness and genetic similarity (c ij , g ij ) provides an initial qualitative assessment of the contribution of the factors in explaining observed streamline counts between regions. According to Fig. 3a, physical closeness seems to be the dominant factor, with the overall organization of the observed connectivity matrix being very similar to the organization of the adjacency matrix defined by physical closeness. The most important specification that the genetic factor provides is the distinction between cortical, subcortical, and cerebellar regions. A closer look into the genetic similarity, however, reveals that the genetic similarity between regions is not homogenous inside the cortex, reflecting distinctions between different lobes.
Note that for all following analyses, the generative model was run individually for each participant, and summary statistics were computed. The base model was run to infer participant-specific parameters (α, β, see Eq. 2) and to compute model likelihoods. We then computed the proportion of the observed connectivity that is explained by the model (ρ, see Eq. 4). The measure ρ quantifies the improvement over a random model and has a maximum value of 1.
Our experiments revealed that the most dominant factor in describing streamline counts between regions was physical closeness (Fig. 3b, c). When using single factors alone, we had average (across all participants) ρ of 0.60 and 0.05 for closeness and genetic similarity, respectively. Inclusion of the closeness factor increased ρ by 0.56 on average (95% CI: 0.559, 0.561) as compared to the model including only the genetic factors (Wilcoxon signed-rank test to compare AIC values, p < 1 × 10 −16 ). The average increase in ρ was 0.003 (95% CI: 0.002, 0.004) with the inclusion of the genetic factor in the model including only closeness factor (Wilcoxon signed-rank test to compare AIC values, p < 1 × 10 −16 ). Finally, when using the both factors (Fig. 3c, d), we observed the best model fit (ρ=0.61). Note that inclusion of a factor does not increase ρ in a linear fashion due to nonlinear dependencies between factors and the resulting model likelihood.

Architectural Specifications of Brain Network
With the inclusion of the architectural specifications that encodes the modular structure, we observed a significantly better model fit (ρ=0.70) (Fig. 3c, d), with average 0.096 increase in ρ (95% CI: 0.096, 0.098) compared to the base model (Wilcoxon signed-rank test to compare AIC values, p < 1 × 10 −16 ). We also assess the significance of modules compared to randomly defined modules (permutation testing with random modules, p < 1 × 10 −3 ).
After imposing both modular structure and presence of hub regions, the average ρ was 0.72, which was a significant increase compared to the modular structure alone and to the base model (Wilcoxon signedrank test to compare AIC values, p < 1 × 10 −16 ) (Fig. 3c). Thus, our analyses supported the hypothesis that architectural specifications have a significant role in explaining the observed streamline counts between regions.

A Modular yet Integrative Organization
We identified 6 consensus structural modules. Two pairs of symmetric modules and two modules extending to both hemispheres in Fig. 4a, b reflect the consensus of module assignments from all participants. The final assignment of regions into modules is given in Supplemental File 4. The consensus set of hub regions for all participants is illustrated in Fig. 4c.
Additionally, we investigated how structural network organization is related to the functional specialization of regions. We calculated the similarity between the alliance of regions determined by the structural modules and the predefined functional systems using Normalized Mutual Information (NMI). This yielded an NMI value of 0.48 (permutation testing with random modules, p = 0.0003), suggesting a significant functional correlate of structural modules.

Developmental Effects
Participant-specific parameters (α, β, λ) and ρ values were studied across ages in order to show how model parameters and the contribution of different factors evolve during development. When the full model was run with the architectural specifications, we observed significant changes in the model parameters with age. The coefficient of the physical closeness (α) showed a significant developmental decrease (Pearson correlation coefficient r = −0.18, p = 5 × 10 −7 ), while the coefficient of the genetic similarity (β) showed a significant increase (Pearson correlation coefficient r = 0.12 , p = 0.0003 ). We also observed a significant increase in the parameter (λ) encoding architectural specifications corresponding to a modular yet integrative organization (Pearson correlation coefficient r = 0.26, p = 5 × 10 −14 ).
In contrast to the decreasing effect of the base model, the contribution of the architectural specifications in ρ was positively correlated with age (Fig. 5d, e). The relative change in ρ value as compared to the base model showed significant increase both for modular structure alone (Pearson correlation coefficient r = 0.27, p = 1 × 10 −15 ) and for modular structure with hubs (Pearson correlation coefficient r = 0.32, p < 1 × 10 −16 ). Additionally, the modularity coefficient (Q) that quantifies the extent that a network has a modular organization (that is, dense connections within modules, sparse connections between modules) [42] showed a significant increase by age (Pearson correlation coefficient r = 0.26, p < 4 × 10 −14 ) (Fig.  5f).
We finally investigated how the relationship between structural network architecture and functional specialization of regions evolves during development. We calculated individual NMI values for each participant, by comparing alliance of regions determined by the structural modules and the predefined functional systems. We observed a significant positive correlation between individual NMI values and ages of participants (Pearson correlation coefficient r = 0.11 , p = 0.002 ), indicating that the functional relevance of structural modules increases during development.

DISCUSSION
We investigated how multiple factors contribute towards explaining observed streamline counts, and how their effects change over the course of development. We found a dominant contribution of brain geometry in explaining the observed streamlines, with regional gene expression profiles contributing less, albeit significantly. However, the brain network organization could not be fully accounted for by only considering geometric and genetic features of brain regions. Architectural specifications, which promote segregated modules and integrative hub regions, explained a significant proportion of the observed connectivity. Notably, the extent to which these architectural specifications contribute towards explaining structural connectivity increased during development, relative to the contribution of geometry and genes. Additionally, the functional relevance of the structural modules increased during development.

Wiring Costs and Genes
The human brain develops under special sets of constraints reflected through its physical features. It is spatially embedded into a three-dimensional anatomical space and is subject to metabolic costs in forming axonal connections between regions [43], [44]. Many aspects of brain networks have been linked to such geometric and cost related constraints, via conservation principles such as minimization of wiring cost or energy consumption [45], [46]. Our results provide important insights into the contribution of geometry in explaining observed connectivity between regions, by quantifying the extent to which physical distance between regions explains the structural connectivity. The distance factor was the most dominant contributor in explaining the observed connectivity (Fig. 3). This finding is consistent with the major role of distance in the economy of human brain organization [15], [44], [47].
The effect of regional gene expression levels in the formation of axonal connections has recently begun to be explored in postmortem tissue samples [13]. Notably, in human brains, gene expression profiles define only subtle differences between regions of the neocortex [32], [36], with significant genetic dissimilarities among major sections, namely cortex, subcortex, and cerebellum. Our results were consistent with this fact, as the main genetic differences were observed among these three major sections (Fig. 3).

Connectivity is More Than Wiring Costs and Genes
It is expected that the brain network organization cannot be fully accounted for by wiring costs alone [14], [15], [17], [18], [48]. By using our generative model, we quantified the gap between the observed streamline counts and what is expected from a base model considering only geometric and genetic features of the regions. The proportion of the connectivity explained by the base model (ρ=0.61) pointed to the presence of other factors accounting for the unexplained portion. One possible hypothesis is that brain structural connectivity is shaped in order to balance the trade-off between operating cost minimization and the adaptive value of the resulting organization [44].

Distinctive Architecture of Brain Network
We demonstrated that the modular yet integrative organization of the brain, which has been commonly demonstrated for functional networks [4], is also evident in structural brain networks (Fig. 4). Our results supported the hypothesis that architectural specifications accompany geometric and genetic specifications in explaining the brain network organization [17]. Observed connectivity was explained best by a model that incorporates such architectural specifications related to the presence of modules and hubs (Fig. 3). Although our results demonstrated the importance of the architectural specifications, it is still unclear what other factors can be associated with these specifications. A promising future research direction can be the identification of molecular, as well as system-level features of the brain that correlate with this tendency to have a distinctive architecture.
The identified structural modules showed noticeable symmetry between hemispheres. Notably, the alliance of regions in forming structural modules highlights their functional correlates. One pair of symmetric modules (modules 1,4 in Fig. 4) was mainly defined by regions of frontal and cingulate cortex, encompassing the default mode network, cingulo-opercular system, and ventral attention system. One module that extended to both hemispheres (module 2) mainly consisted of regions of the visual system, including fusiform, inferior temporal, lateral occipital, and lingual. Homotopic visual areas are densely connected through callosal connections [49], possibly in order to facilitate the continuity of perception [50]. This may explain the presence of this bi-hemispheric module. The same module also included left and right cerebellar cortices, that are again densely connected to each other. The division of the temporal lobe between modules can be putatively associated with the separation between visual and auditory/language pathways. Superior and medial temporal cortices were assigned to modules 0 and 5 that included mainly parietal and subcortical regions. The connections between superior temporal cortex and inferior parietal cortex constitute an integral part of the speech processing system [51], [52]. On the other hand, the inferior temporal cortex was assigned to the bi-hemispheric visual module. The ventral stream of visual processing utilizes connections between visual cortex and inferior temporal cortex [53], [54].
Our analyses also demonstrated that the resulting modular organization of the structural brain network is significantly associated with emergent functional systems, supporting previous findings on the positive correlation between structural and fMRI-based functional connectivity [55], [56]. Elaborating the exact role of topological features, such as the presence of structural hubs, in facilitating links between structural modules and functional systems is a promising future research direction.

Brain Network Becomes Even More Distinctive During Development
The human brain undergoes a protracted period of development from childhood to adulthood [57]- [59] that is assumed to be linked to the development of many sophisticated cognitive functions. Cognitive and behavioral changes occurring throughout childhood and adolescence [60] make these periods especially critical for investigations on changes of the white matter connectivity.
Developmental changes in the contribution of individual factors, as suggested by our results, presented an interesting picture of the developing brain during adolescence. The contribution of geometry in explaining streamline counts decreases over development, possibly implying that the links between wiring costs and the network organization of the brain weaken during development, with increasing emphasis on the adaptive value of the resulting organization. This possibility was further supported by the increasing effect of the architectural specifications that promote a modular organization (Fig. 5).

Methodological Considerations
Several practical limitations of the current work should be considered when evaluating the presented results. Measuring structural connectivity using diffusion tensor imaging may suffer from the inability to accurately describe the regions with crossing fibers [61]. More advanced imaging techniques with higher resolutions [62] can be used in future studies where such data has been collected. Furthermore, diffusion based tractography has method-specific biases in fiber lengths, making the resulting connectivity matrices biased towards short-range connections [61], [63]. Therefore, the reported strong contribution of Euclidean distance in explaining the observed streamline counts may be partly attributed to the short-range bias of tractography algorithms. In order to demonstrate the robustness of the reported results to such confounding factors, we used consistency based thresholding [27] to prune possibly spurious connections and repeated our experiments with different network density (threshold) levels. The relative contribution of all factors remained the same across different density levels, suggesting high reliability of our results (Supplementary Fig. S5).
The geometric proximity of regions was computed using Euclidean distance. Other more sophisticated distance measures such as geodesic distance, defined within the white matter geometry, could also be used for the same purpose. However, this approach could lead to a circular argument when explaining connectivity by geometry, since white matter geometry needs to be defined by observed streamline connectivity.
The blurring effect caused by head motion during image acquisition may introduce a spurious increase in the contribution of geometry, which will possibly be higher in younger participants. In order to have results robust to motion effects, we have used the comprehensive quality assurance pipeline as described in [21]. The primary measurement of in-scanner head motion was mean relative volume-to-volume displacement as determined by rigid-body motion correction. According to this motion metric, our sample (mean: 0.46mm, std: 0.4mm) included images with "excellent" and "good" quality (please refer to Table 1 in Roalf et al. 2016). Nevertheless, more analyses can be done in future by including the amount of motion as a covariate, although which motion parameters to be included remains a current challenge.
In our analyses, we defined a generative model of structural connectivity using a parametric model (Dirichlet-Multinomial). Thus, the measure ρ that we used to calculate the proportion of the observed connectivity explained by the model, is a reliable measure for the same, as long as the selected parametric model is an appropriate choice for our dataset. In order to demonstrate that our model can reliably describe structural connectivity, we compared it with several nonparametric models, validating the plausibility of our parametric choices (Supplementary Note S3). We also showed that the reported results do not change when using a more conventional measure of performance. Instead of using the measure ρ, we repeated our experiments using root mean square error (RMSE) between empirical networks and the ones that are generated by our model. The contributions of factors were stable across different metrics, highlighting reliability of our results (see Supplementary Note S9).
Traditionally, hub regions are defined based on the degree, strength, centrality, or the participation coefficient distributions of the regions [64], [65]. For this reason, our definition of hubs based on certain architectural specifications may seem to be an unorthodox choice. We, therefore, showed that the identified hub regions have high participation coefficients and betweenness centrality (see Supplementary Note S10). This suggests that using certain architectural specifications to identify hub regions provides similar results as compared to traditional definitions.
The region-to-region genetic similarity was calculated from genome-wide expression levels of brain tissue samples from six different donors [33] of the Allen Brain Atlas [32]. The inter-regional genetic similarity factor was therefore same for all participants in our imaging sample. However, it is expected that the gene expression profiles, and hence the region-to-region similarities change over the course of development. The current expression profiles were computed using microarray data from adults (ages:24-57 years), which precludes a detailed analysis using our sample of young individuals (ages: [8][9][10][11][12][13][14][15][16][17][18][19][20][21][22]. This important limitation should be considered when interpreting our genetic findings. However, our work presents an important first step in studying the contribution of gene expression to brain organization, notwithstanding the fact that our data does not have in vivo characterization of the brain's transcriptional distribution. When studying the relationship between the a priori functional systems and the structural modules, we defined a fixed set of functional systems using results from a study [38] that had used a young adult sample (mean-age: 25.2 years, std: 2.5 years). In order to have more reliable interpretations regarding the relationship between functional systems and structural modules, however, future investigations should consider the fact that the functional organization of the brain also changes throughout development [2]. Defining functional systems for each participant individually, using a multimodal dataset including functional imaging modalities, is a possible future direction to take.

A Generic Model for Future Neuroscience Studies
In this study, we considered effects of two intrinsic features, namely distance and gene expression, plus two architectural specifications corresponding to the presence of modules and hub regions. One advantage of the proposed methodology is the ease of inclusion of novel factors, including fiber lengths, cortical gyrification, and cytoarchitectonic or other histological properties of regions, providing a generic tool for future studies. Furthermore, probabilistic generative models can successfully represent different individual modalities of brain connectivity such as functional connectivity and can be used to model different modalities jointly. Multimodal connectivity is an especially exciting avenue of research to pursue.
Employing generative models of brain connectivity may further advance network neuroscience by providing a means of modeling joint and possibly nonlinear contributions of multiple factors in describing structural and functional characteristics of the brain network. This may enable development of new methodologies in effectively identifying alterations in connectivity patterns that have been shown to be present in clinical samples [66], between sexes [67], and during learning [68] or development [69].  The expected number of connections are calculated based on the genetic and geometric (Euclidean closeness) relationships among regions, as well as by considering certain architectural specifications. After model fitting, the two main outcomes are the quantification of the contribution of different factors in explaining the observed streamline counts and the organization of brain regions. Finally, a developmental analysis to probe changes in these outcomes across ages is performed. The models including modules and hubs also include closeness and genetic similarity factors. All pairwise comparisons between models using Wilcoxon signed-rank test yield statistically significant differences. (d) Comparisons of mean ρ values between three models, namely the base model (closeness and genes), the modular model, and the modular yet integrative model. (a) 6 modules were identified with two bi-hemispheric (modules 2,3) and two symmetric pairs of modules. The alliance of regions into modules reflects the functional correlates of these modules. For instance, module 2 mainly consists of regions of the visual system. The modules 1 and 4 form a structural basis for several functional systems including the default mode network, cingulo-opercular system, and ventral attention system. Connections between regions of the module 0 constitute an integral part of the language processing system. (b) Visualization of same modules on axial and coronal slices. (c) Hub regions of the brain network. Hubs are mainly accumulated at the anterior, posterior, and temporal regions instead of being around the geometric center of the network that would be preferred by a geometric model of connectivity. The proportion of explained connectivity as measured by ρ across ages, using (a) only genetic similarity factor, (b) only closeness factor, and (c) both. Pearson correlation coefficient (r) is given at the top of each figure. In all the three cases, explained connectivity shows a significant decrease over the course of development. The reverse is observed for the architectural specifications corresponding to the presence of (d) modules and (e) both modules and hubs. In (d) and (e), the relative increase in ρ compared to the base model (including both closeness and genetic similarity factors) is given. (f) Developmental change in the modularity coefficient (Q) that quantifies the extent that a network has a modular organization.