Predicting age and clinical risk from the neonatal connectome

The development of perinatal brain connectivity underpins motor, cognitive and behavioural abilities in later life. With the rise of advanced imaging methods such as diffusion MRI, the study of brain connectivity has emerged as an important tool to understand subtle alterations associated with neurodevelopmental conditions. Brain connectivity derived from diffusion MRI is complex, multi-dimensional and noisy, and hence it can be challenging to interpret on an individual basis. Machine learning methods have proven to be a powerful tool to uncover hidden patterns in such data, thus opening an opportunity for early identification of atypical development and potentially more efficient treatment. In this work, we used Deep Neural Networks and Random Forests to predict neurodevelopmental characteristics from neonatal structural connectomes, in a large sample of neonates (N = 524) derived from the developing Human Connectome Project. We achieved a highly accurate prediction of post menstrual age (PMA) at scan on term-born infants (Mean absolute error (MAE) = 0.72 weeks, r = 0.83, p<<0.001). We also achieved good accuracy when predicting gestational age at birth on a cohort of term and preterm babies scanned at term equivalent age (MAE = 2.21 weeks, r = 0.82, p<<0.001). From our models of PMA at scan for infants born at term, we computed the brain maturation index (i.e. predicted minus actual age) of individual preterm neonates and found significant correlation of this index with motor outcome at 18 months corrected age. Our results suggest that the neural substrate for later neurological functioning is detectable within a few weeks after birth in the structural connectome.


INTRODUCTION
Magnetic Resonance Imaging (MRI) has provided a broad range of in vivo insights about the structure and function of the human brain. Diffusion MRI in particular, enables the characterization of microstructural changes in orientation and structure of white matter, opening the possibility to study brain structural connectivity and allowing the systematic description of whole-brain structural networks: the human connectome (Honey et al., 2010;Sporns et al., 2005).
During the perinatal period, the brain undergoes significant changes and consolidation of its structural connectivity, which underpins the expansion of motor, cognitive and behavioural abilities (Johnson, 2001). Since the inception of connectomics (Hagmann 2005;Sporns, Tononi, and Kötter 2005) several studies have tried to characterise early development of the structural connectome (Fan et al. 2011;Hagmann et al. 2010). Subtle alterations in the development of brain connectivity have been suggested to underlie atypical neurodevelopmental outcome in populations with perinatal risk factors, such as children born preterm (Batalle et al., 2018). This is of clear significance as preterm birth comprises approximately 11% of all births, and is the main global cause of death and disability in children under 5 years of age (Blencowe et al., 2012), as well as representing one of the most pervasive perinatal risk factors for atypical neurodevelopment (Wood et al., 2000). It has been associated with an increased risk of developing neurodevelopmental conditions such as motor, visuospatial and sensorimotor delay (Marlow et al., 2007), inattention, anxiety and social difficulties (Johnson and Marlow, 2014), autism spectrum (Johnson et al., 2010), cerebral palsy (Marlow et al., 2005) or psychiatric disorders in adulthood such as depression and bipolarity (Nosarti et al., 2012).
With the increasing interest in detailed study of neonatal brain connectivity, projects such as the developing Human Connectome Project (dHCP) have arisen which allow the development of bespoke methods to study the brain during this crucial period. The dHCP is an open science project which provides a large normative sample of neonatal structural, diffusion and functional MRI data with high spatial, angular and temporal resolutions. Features of this project include: advances in hardware  and protocols for neonatal diffusion MRI acquisition ; the use of multiband techniques to accelerate acquisition time combined with approaches to correct motion (Cordero-Grande et al. 2016;Cordero-Grande et al. 2019); and the development of state-of-the-art preprocessing pipelines for neonatal MRI data Bastiani et al. 2019;Fitzgibbon et al. 2020;Makropoulos et al., 2018). These have together significantly improved neonatal MRI acquisition methods and data quality.
Despite this progress, studying the neonatal connectome remains challenging. Indeed, many methodological issues hamper the interpretation of the connectome (Sporns, 2013) including the difficulty of detecting origins and termination of connections (Jbabdi and Johansen-Berg, 2011) or a high number of false positive streamlines (Maier-Hein et al., 2017). Furthermore, the high dimensionality of the data and low number of scans usually available in neonates make underlying patterns very difficult to uncover.
A promising method in adult and neonatal neuroscience is the study of the "brain maturation index" (also known as "brain delta" or "predicted age difference") corresponding to the apparent age of the subject as compared to the norm (Dosenbach et al. 2010;Cao et al. 2015;Jonsson et al. 2019;Liem et al. 2017;Smith et al. 2019). By training regression models to fit the age of subjects from large normative imaging datasets, we can predict the age of individual subjects and compute the difference between the prediction and subject's true age. This difference gives information about brain maturation and its divergence from the population norm. As such, in adults, a positive (predicted age > true age) difference is interpreted as demonstrating accelerated ageing, and is associated with disorders such as cognitive impairment (Liem et al., 2017), schizophrenia (Koutsouleris et al., 2014) or diabetes (Franke et al., 2013). In neurodevelopment, studying the brain maturation index is therefore extremely relevant for preterm born infants where neurodevelopmental delays and psychiatric disorders often occur Galdi et al., 2020;Rasmussen et al., 2017).
In this work, we propose the use of two different machine learning algorithms -Random Forests (RF) and Deep Neural Networks (DNN) -to predict PMA at scan (i.e. brain maturation), and gestational age (GA) at birth (i.e. degree of prematurity), from the neonatal structural connectome in a large sample of neonates scanned at term equivalent age. Using models obtained for the term-born cohort, brain maturation index was computed for our pretermborn cohort allowing us to assess whether it predicted neurodevelopmental outcome at 18 months.

Participants
All participants were part of the dHCP, approved by the National Research Ethics Service West London committee (14/LO/1169). 524 infants (240 female and 284 male), born between 23 +0 weeks and 42 +2 week of gestation, underwent MRI between 37 +1 weeks and 45 +1 weeks. The participant gestational age at birth (GA) and postmenstrual age at scan (PMA) distribution is presented in Figure 1A-B. Full participant clinical information is presented in Table 1.
The Bayley III Scales of Infant and Toddler Development (BSID-III) (Bayley 2006) were collected at 18 months corrected age and available for 314 infants including 50 preterm-born infants. We used scores of motor (fine and gross), communication (expressive and receptive) and cognitive (raw) score. Assessments were carried out by experienced paediatricians or psychologists. Detailed assessment distributions are presented in Table 1.

MRI acquisition
All scans were collected in the Evelina Newborn Imaging Centre based on the Neonatal Intensive Care Unit, St Thomas hospital London using a Philips Achieva 3T scanner (Best, NL). All scans were acquired using the dHCP neonatal brain imaging system which includes a 32 channel receive neonatal head coil (Rapid Biomedical GmbH, Rimpar, DE) . Informed written parental consent was obtained prior to imaging. Positioning of all infants was done with a lightweight protective "shell", which was positioned on an MRI safe trolley to ease transportation. Immobilization of the infants in the shell was done using bead filled inflatable pads (Pearltec, Zurich, CH). In addition to the pads, acoustic protection included earplugs moulded from a silicone-based putty (President Putty, Coltene Whaledent, Mahwah, NJ, USA) placed in the external auditory meatus and neonatal earmuffs (MiniMuffs, Natus Medical Inc, San Carlos, CA, USA). To avoid sudden sound changes which might wake up the infant, the MRI software was modified in order to gradually increase the noise from 0 to the average operating point . All scans were supervised by a paediatrician or neonatal nurse experienced in MRI procedures; vital signs including pulse oximetry, temperature and electrocardiography data were monitored throughout data acquisition. All infants were scanned during natural unsedated sleep following feeding.

Pre-processing and connectome generation
Tissue segmentation of T2-weighted volumes was performed using a neonatal specific segmentation pipeline (Makropoulos et al., 2014) and template ). Parcellation of 90 cortical and subcortical regions  adapted to the dHCP weekly age-dependant high-resolution bespoke template ) was propagated to each subject's T2w native space through non-linear registration based in a diffeomorphic symmetric image normalization method (SyN) available in ANTS software (Avants et al., 2011), using T2w contrast and tissue segmentation as input channels. Tissue maps and atlas parcellation were propagated from each T2w native space to each subject's diffusion native space with a rigid registration using b=0 volumes as target. All rigid registrations were performed with IRTK software (Schnabel et al. 2001). Details of the 90 cortical and subcortical regions are presented in Supplementary Table 1.
Diffusion MRI was reconstructed at an effective resolution of 1.5mm isotropic and denoised using a patch-based estimation of the diffusion signal based on random matrix theory (Veraart et al., 2016). Gibbs ringing was suppressed (Kellner et al., 2016) and B0 field map estimated from b=0 volumes in order to correct magnetic susceptibility-induced distortion using FSL Topup (Andersson et al., 2003). Data was corrected for slice-level motion and distortion in a data-driven q-space representation using a bespoke spherical harmonics and radial decomposition (SHARD) basis of rank 89 corresponding to spherical harmonics of order lmax=0,4,6,8 for each respective shell, with registration operating at a reduced rank of 22 . DWI intensity inhomogeneity field correction was performed using the ANTs N4 algorithm (Tustison et al., 2010). Tools and pipelines implemented in MRtrix3  were used for quantitative analysis of the diffusion MRI data. Developing neonatal brain tissue undergoes rapid changes in cellular properties and water content that can be to a first approximation captured by a non-negative linear combination of anisotropic signal from relatively mature WM and from isotropic free fluid . We use data from 20 healthy full term control babies from our sample to extract a set of two representative WM (Tournier et al., 2013) and fluid-like (Dhollander et al., 2016, Dhollander et al., 2018 signal fingerprint (response functions) that are used to deconvolve each subject's diffusion signal into a fibre orientation distribution (FOD) image, capturing WM-GM-like signal, and scalar fluid density image using the multi-tissue multi-shell constrained spherical deconvolution technique (Jeurissen et al. 2014). Residual intensity inhomogeneity were corrected and component densities calibrated using a multi-tissue logdomain intensity normalisation (Raffelt, et al. 2017). Resulting normalised WM-GM-like FODs were used to generate 10 million streamlines with an anatomically constrained probabilistic tractography (ACT) (Smith et al., 2012) with biologically accurate weights (SIFT2) . The fibre density SIFT2 proportionality coefficient (μ) for each subject was obtained to achieve inter-subject connection density normalisation. The structural connectivity network of each infant was then constructed by calculating the μ × SIFT2-weighted sum of streamlines connecting each pair of regions (thus built as a symmetric adjacency matrix of size 90x90).
In addition, we used 73 structural connectivity matrices obtained from an independent dataset (Batalle et al., 2017) to test the design of the initial hyperparameters and architecture for predictive algorithms presented in sections 2.4.3 and 2.4.4.

Prediction of age at scan and age at birth
All analyses on this section were performed using Python 3.7. The machine learning library Scikit Learn (Pedregosa et al., 2011) was used for training the RF algorithm. The deep learning framework Keras (version 2.0.3) (Chollet et al., 2015) was used to train the deep learning models.

Feature set
As the structural connectome is presented as a symmetric adjacency matrix (in our case of size 90x90, with 90 brain regions) the lower triangle of the matrix contains all information.
We thus extracted and reshaped the lower triangle of each subject's structural connectome as a 1D vector with number of connectivity elements n = 4005, thus leading to the ensemble of connectivity vectors across subjects: We normalized each data point across the training sets, and normalized the testing set with the training normalization values using a min-max normalization: Thus, assuming that testing data also falls between previous ranges, our training and testing data has the following form:

Prediction models
We carried a set of predictions of demographic information on different population samples using different regression algorithms. In each case, we fitted a regression model to predict a variable representing demographic information (e.g. GA at birth or PMA at scan) for subjects in our dataset. We thus had prediction Y' as follows: We computed the regressor that minimizes | ′ − | 2 . We used two different supervised machine learning regression algorithms to do this: RF and DNN.

Random Forests regression
RF are an ensemble learning method for classification and regression based on constructing a multitude of decision trees (weak learners) which are individually trained through the technique of "bagging". RF makes predictions by averaging the prediction of each individual tree, hence acting as a strong learner (Breiman L., 2001). For optimal performance, two main hyperparameters should be tuned: the number of trees (estimators) in the forest and the maximum depth of each tree. The number of trees determines the smoothness of the decision boundary and the depth corresponds to the maximum number of levels allowed for each tree. RF regressors' performance often depends on finding the optimal value for these to ensure that there is no overfitting or underfitting.
Here we use the RF regressor from the Scikit Learn RandomForestRegressor implementation (Pedregosa et al., 2011). The RF were trained using mean squared error (MSE) as loss function. Hyperparameters were tuned separately for the PMA at scan and GA at birth prediction by performing a grid search on a set of 73 structural connectomes from an independent dataset (Batalle et al., 2017). This allowed us to choose optimal parameters without overfitting our model to the studied data. Hyperparameters used are presented in sections 2.4.6 and 2.4.7.

Deep Neural Networks regression
Deep (Fully Connected) Neural Networks (DNN) are universal function approximators whose parameters can be trained to model complex nonlinear relationships between features and labels via backpropagation (Rumelhart et al., 1986). However, the performance of a DNN also depends on the hyperparameters: design choices are mainly related to the architecture of the network (the layer types, number of layers and number of nodes per layers, activation functions at each layer), the loss function, and the training method (the number of epochs, the optimization function and its parameters).
The DNN in this work were implemented using the deep learning library Keras (version 2.0.3) (Chollet et al., 2015). As performing a grid search to find the best model hyperparameters is computationally expensive when training DNN, we started with a basic architecture built from previous work and common DNN knowledge (Smith 2018), and subsequently optimized these via manual refinement architecture search. To avoid overfitting the model to the data used in this paper, this was done on the same set of 73 structural connectomes from an independent sample as was used for the RF training, independently for the GA at birth and PMA at scan prediction. For both prediction tasks, the models were trained using MSE as a loss function and the Adam optimizer (Kingma and Ba, 2015), albeit with different learning rates. Further details on the network architecture are included in sections 2.4.6 and 2.4.7.

Training and evaluation of the models
To assess the performance of the prediction models, the evaluation metric was calculated on test data excluded from training and hyperparameter tuning. We split the dataset into k groups (folds) and fit the model k times. Each time, one group is used to evaluate performance, while the rest of the groups are used for training and validation. The evaluation scheme is presented in Figure 2B.
We split the data into k=5 groups (folds), with 20% of data used for testing at each fold. The remaining 80% of the data were further split for training (65%) to fit the models and validation (15%) to tune the hyperparameters. Min-max normalisation presented in section 2.4.1 is fitted on the training/validation set, where normalization parameters are saved and applied to the test set.
We added a bias correction as previously described Peng et al. 2019) to correct age dependency of the training residuals. Briefly, we used a linear model ′ = ( ) = + to obtain an unbiased estimate of ′ as ̂ = ′ − , where the parameters and are estimated during training (on both the combination of training and validation set) and are thus applied directly to the test set. We obtained our final corrected prediction ̂ for each structural connectome as follows: The final performance is calculated by averaging test-set performance over the 5 folds. We used mean absolute error (MAE) as our evaluation metric, calculated on each test set k as follows: Where is the number of subjects belonging to test set k (S ) and and ̂ are actual and predicted outcome of subject i. In addition, we also evaluate MSE and 2 scores for each test set k, which are calculated as follows: Where ̅ is the mean actual age of test set k. We also calculated Pearson's Correlation ( ) and p-value ( ) between actual ( ) and predicted output (̂) for each test set. As we obtain a prediction for every subject (albeit with different models) we can compute the , , 2 , and by considering the predictions of the 5 test sets (see Figure 2A). Finally, we assessed the presence of heteroscedasticity in our predictions by comparing the variance 2 of the absolute error -a lower variance signifies more homoscedastic predictions.

Prediction of PMA at scan in term-born infants
To build a model of typical development of connectivity we used the full cohort of 418 termborn babies (GA at birth >= 37) with PMA at scan between 37 and 45 weeks.
We first predicted PMA at scan from the vectorised and normalized structural connectome ̃ using RF regressor model. Optimal parameters of the model (max depth = 250, number of estimators = 30) were found by performing a grid search in an independent dataset (see section 2.4.3). We trained each fold on N335 samples (80%) including a validation set. We then tested the model on the remaining set (N83, 20%) in each fold, thus being able to predict age at scan on all 418 structural connectomes of term infants (see Figure 2A).
In a similar fashion, we also trained a regression DNN to predict PMA at scan from the vectorised and normalized structural connectome ̃. This DNN comprises one input layer with 4005 input nodes, 7 hidden layers, 6 activation layers (ReLu), 5 batch normalisation layers and one output layer with one node. Training was done for 50 epochs with learning rate of 0.007 and remaining parameters with default value. Detailed structure of the architecture of this DNN is provided in Figure 2D.
We applied the previously described bias correction method on both DNN and RF, by fitting and for each model using both the training and validation set; thus reaching 5 distinct models 1 , 2 , … , 5 for both the DNN and RF methods.

Prediction of GA at birth
To assess the effect of preterm birth on structural connectivity we trained a prediction model for GA at birth from ̃ with both DNN and RF in a similar fashion as previously described for prediction of PMA at scan.
Since the dHCP cohort has significantly more term-born than preterm-born infants, there is a "class imbalance" in the GA distribution that may skew the model prediction. We therefore randomly selected a sub-sample of term subjects that had, on average, equal density of subjects on each GA at birth weekly bin. Our 106 preterm infants were distributed in 15 different GA at birth bins (22w-23w; 23w-24w … 36w-37w), thus providing an average of 7 infants per age category. We kept all 106 preterm infants and randomly sampled 7 infants from each of the term age categories (37w-38w, 38w-39w, … 41w-42w) and the 4 subjects born 42w-43w (as only 4 were born between 42w and 43w GA), for a total of 39 term-born infants. This resulted in a total of 145 infants with a balanced distribution (see Figure 1C).
We first attempted prediction of GA at birth from the vectorised and normalized structural connectome ̃ using RF with optimal parameters max depth = 300, number of estimators = 50. For each fold, we trained using N=116 samples (80%) including a validation set. We then tested the model on the remaining set (N=29, 20%) in each fold, predicting GA at birth for all 145 structural connectomes considered.
In a similar fashion, we also trained a DNN to predict GA at birth. This DNN consists of one input layer, 6 hidden layers, 6 activation layers (ReLu), one dropout layer, and one output layer. 120 epochs were used for training, with learning rate 0.003 and remaining parameters at default value. Detailed information on the architecture is provided in Figure 2E.
We applied the bias correction method on both DNN and RF by fitting and for each model ℎ from the validation set; thus reaching 5 distinct models 1 , 2 , … , 5 for both the DNN and RF methods.

Brain maturation index
We defined brain maturation index (also called brain age or predicted age difference in the literature) as the difference between the predicted age ̂ and true age of a subject n (Dosenbach et al., 2010): =̂− We developed a model of typical brain development by training 5 models to predict PMA at scan on term-born infants only (section 2.4.6). Prediction of PMA at scan for each preterm subject was computed by taking the mean of the predictions from each of the 5 DNN trained models from each cross-validation partition: Following this, we computed the brain maturation index of each preterm subject:

Statistical methods
Differences between term and preterm cohorts on all relevant characteristics were assessed by computing a two tailed independent t-test or chi-square test as appropriate. The association between brain maturation index and neurodevelopmental outcomes was assessed with Pearson's Correlation coefficient for all preterm infants having both and BSID-III developmental outcomes at age 18 months corrected age. All outcomes were corrected for socio economic status, captured by the English Index of Multiple Deprivation (IMD). IMD factor summarizes information from 38 different factors such as income, employment, education, crime rates and health situation for all neighbourhoods in England (Index of Multiple Deprivation, 2015). Lower IMD relates to lower level of deprivation. All pvalues presented are uncorrected for multiple comparisons.

Data availability
The imaging and collateral data from the dHCP can be downloaded by registering at https://data.developingconnectome.org/ Structural connectivity networks and code used to predict age at birth and age at scan are available in https://github.com/CoDe-Neuro/Predicting-age-and-clinical-risk-from-theneonatal-connectome

Sample characteristics
There were no significant differences in PMA at scan and male/female proportion between term and preterm neonates in this study. For the subjects for which 18 months BSID-III followup neurodevelopmental assessment was available, there were no significant differences in outcomes between term and preterm infants. However, a significant group difference (p<<0.001) was found in IMD scores, with term infants showing significantly higher deprivation than preterm infants. Detailed cohort characteristics are provided in Table 1 and Figure 1. Neurodevelopmental outcome details are provided in Table 1.

Prediction of PMA at scan with RF
We trained a RF regressor to fit for PMA at scan from vectorised and normalized structural connectome ̃ on term infants only. We obtained = 0.84 weeks, = 1.10, 2 = 0.61, 2 = 1.10 with correlation between true and predicted = 0.79 ( ≪ 0.001). Figure 3A shows true PMA vs predicted PMA on each of the 5 cross-validation folds. Detailed results of each fold are presented in Figure 3C.

PMA at scan prediction with DNN
Similarly, we trained a DNN regressor to fit for PMA at scan from vectorised and normalized structural connectome ̃ on term infants only. We obtained = 0.72 weeks, = 0.94, 2 = 0.67, 2 = 0.94 with correlation between true and predicted = 0.83 ( ≪ 0.001). Figure 3B shows true PMA vs predicted PMA on each of the 5 crossvalidation folds. Detailed results of each fold are presented in Figure 3D.

Prediction of GA at birth with RF
We trained a RF regressor to fit GA at birth from vectorised and normalized structural connectome ̃ on balanced data (145 infants). We obtained = 2.76 weeks, = 12.95, 2 = 0.43, 2 = 12.93 with correlation between true and predicted = 0.67 ( ≪ 0.001). Figure 4A shows true GA at birth vs predicted GA at birth on each of the 5 folds fold. Detailed results of each fold are presented in Figure 4B.

Prediction of GA at birth with DNN
Similarly, we trained a DNN from vectorised and normalized structural connectomẽ on balanced data. We obtained = 2.21 weeks, = 8.90, 2 = 0.61, 2 = 8.86 with correlation between true and predicted = 0.82 ( ≪ 0.001). Figure 4B shows true GA vs predicted GA on each of the 5 folds. Detailed results of each fold are presented in Figure 4D.

Brain Maturation Index
We computed the predicted PMA at scan of each preterm-born infant scanned at termequivalent age by averaging out the 5 predictions from the DNN term trained models. We obtained MAE of 1.16 weeks on prediction of 106 preterm infants ( = 2.24, 2 = 0.42, 2 = 1.45), with correlation between true and predicted age between = 0.79, ( ≪ 0.001). True vs predicted age is presented in Figure 5A.
We computed the brain maturation index from each prediction. Brain maturation index ( ) was significantly correlated with BSID-III gross motor scale at 18 months corrected age ( = 0.4590, = 0.0008, Figure 5BC).

Discussion
This work demonstrates that machine learning can uncover connectivity patterns associated with typical and atypical development of structural brain connectivity, despite the high dimensionality of the data. We obtained accurate prediction of PMA at scan in term-born infants in a large normative sample of high-quality neonatal MRI data from the dHCP and predicted GA at birth in preterm infants from scans at term equivalent age, showing the impact of preterm birth on the structural connectome. Furthermore, we have also shown significant correlation between brain maturation index estimated from brain connectivity at birth and BSID-III gross motor outcome at 18 months of corrected age in preterm babies. Overall, our results show that machine learning approaches can extract relevant information on brain development from the neonatal structural connectome.

Prediction of age at scan
We achieved high accuracy in our prediction of PMA at scan on the term cohort, reaching a low MAE of 0.72 weeks ( = 0.94, 2 = 0.67 ) and a high correlation between true and predicted age ( = 0.83; ≪ 0.001). While the structural connectome presents several important challenges to study brain connectivity (Campbell and Pike, 2014) including high numbers of false positive streamlines (Maier-Hein et al., 2017), our results suggest that there is reliable information present to capture the subtle changes associated with weekly development. There have been several studies evaluating white matter microstructural and connectivity changes during the first days after birth. Indeed, it has been found that the postnatal period is marked by further dendritic arborization, refinement of existing intracortical connections, and an increase in synaptogenesis which results in an abundance of connections (for a review see (Keunen et al., 2017)). Some important changes have also been found in the structural connectome in the early postnatal period, mainly an increase in integration (the ease with which different brain regions communicate) and segregation (presence of clusters, i.e., capacity for specialised processing) (Batalle et al., 2017). These changes are likely captured by DNN and underlie the accurate prediction of PMA at scan.
Predicting PMA at scan from the structural connectome has, to our knowledge, only previously been done by Kawahara et al., where a MAE of 2.17 weeks and a correlation between true and predicted age of 0.87 were achieved in a cohort of 115 preterm infants (between 24 and 32 weeks PMA) (Kawahara et al., 2017). Although we achieved comparable correlation between true and predicted age, we achieved an improved MAE, which is likely due to the larger high-quality dataset and use of normative population.

Prediction of age at birth
Several studies have assessed the effects of prematurity in brain structure. Volumetric changes in the cerebellum (Limperopoulos et al., 2010), cortical and subcortical grey matter (Padilla et al., 2015) and altered shape of hippocampus (Thompson et al., 2009) have all been associated with preterm birth. The preterm brain white matter is also affected -studies have demonstrated significant alterations of white matter microstructure in the preterm infant without visible focal lesions on clinical MRI (Anjari et al., 2007;Hüppi et al., 1998), as well as more overt white matter injuries such as periventricular leukomalacia (Counsell et al., 2003;Volpe, 2003). Other studies have focused on the impact of preterm birth on brain connectivity. Interestingly, several key components of structural connectivity appear to be unaltered by prematurity including rich club organization (Ball et al., 2014) and core connections (Batalle et al., 2017). However, whilst these core features are preserved, various other components are significantly impacted by the degree of prematurity, including decreased local connectivity in various brain regions (such as cerebellum and superior frontal lobe) (Batalle et al., 2017), or reduced fractional anisotropy in the corticospinal tracts and corpus callosum (Ball et al., 2012).
These alterations likely underlie the ability of our DNN to clearly decipher the level of prematurity of infants from the structural connectome and led to a relatively good prediction of GA at birth (DNN performance of = 2.21 ; = 8.90; 2 = 0.61; = 0.82; ≪ 0.001) from the structural connectome of preterm-born infants scanned at term equivalent age. This shows that the structural connectome contains significant markers of cerebral white matter abnormalities, which are characteristic of prematurity level. This result is in keeping with an earlier study by Brown and colleagues which achieved prediction with precision of 1.6 weeks from the structural connectome of preterm infants only (77 scans, GA at birth between 24 and 32 weeks) using RF . This high performance is likely due to the reduced age range of the cohort compared to our study. Smyser and colleagues also attempted to predict preterm-birth from brain networks obtained from functional MRI, using support vector machines with a cohort of 100 babies (50 preterm) scanned at term equivalent age (Smyser et al., 2016), although their classification was only dichotomic. They achieved accuracy of 84% in classifying term-vs preterm-birth, suggesting that alterations in brain connectivity are also present in functional connectivity networks.

Brain maturation index
Brain maturation indices have been suggested as a powerful tool to capture alterations in the maturational trajectories of brain connectivity (Cao et al., 2015). Association with neurodevelopmental outcomes such as BSID-III is then important to evaluate the lasting impact of this predicted delay. Our bespoke brain maturation index was thus computed for all preterm infants in which BSID-III outcome were available (50 infants). Contrary to adulthood where a positive (predicted age > true age) is associated with emergence of various disorders such as cognitive decline (Jonsson et al., 2019), a negative (predicted age < true age) can be associated with developmental delay in neonates. This hypothesis was verified as we found a positive correlation between the brain maturation index and BSID-III gross motor outcome in our cohort of preterm infants at 18 months of corrected age. This is in accordance with previous findings which have linked preterm birth to motor delay in later development (Foulder-Hughes and Cooke, 2007). This suggests that the brain maturation index may be a useful tool to capture potential delays and disorders in structural connectivity which may have a lasting impact on later neurological outcomes. Recent studies have shown the potential of normative modelling to find individual alterations in preterm babies (Dimitrova et al., 2020;O'Muircheartaigh et al., 2020), which are characterised by heterogeneous brain changes. In a similar way, we suggest that for an individual subject, a high deviation from the population norm, translating to age predictions significantly lower than true age (negative brain maturation index) can be a marker of potential developmental delay so that these subjects should undergo further tests and may need follow up. Therefore, this marker may provide an opportunity for early preventive intervention, as other similar studies suggest (Cao et al., 2015).

Methodological considerations
We have assessed the performance of RF and DNN for prediction of key developmental characteristics in a large sample of neonates. Although both RF and DNN have been used extensively in the literature for brain imaging analysis, to our knowledge this is the first application in a large cohort of neonates. Previous studies have used convolutional neural networks (Kawahara et al., 2017) to extract information from the structural connectome which are particularly useful for data known to have local correlation, such is the case with segmenting brain MR images (Lecun et al., 1998). However, since the spatial distribution of adjacency matrices are not reflective of brain region locality and connectivity characteristics, we instead chose to use DNN in our investigation. Using this approach, we achieved better performance compared to RF on age prediction from the neonatal structural connectome. Although prediction of PMA at scan was also highly accurate with RF ( = 0.84; = 1.10; 2 = 0.61; 2 = 1.10; = 0.79) DNN achieved better performance ( = 0.72; = 0.94; 2 = 0.67 ; 2 = 0.94; = 0.83) with a more homoscedastic distribution of predictions over each of the 5 cross-validation folds (Figure 3). The improved performance of DNN over RF was more evident for prediction of GA at birth, with a better performance and more homoscedastic distribution of predictions on each fold with DNN ( = 2.21; = 8.90; 2 = 0.61; 2 = 8.86; = 0.82) over RF ( = 2.76; = 12.95; 2 = 0.43; 2 = 12.93; = 0.67) ( Figure  4A-D). Our choice of undersampling the term cohort to achieve balance between age categories, although diminishing the sample size, was necessary to avoid a class imbalance problem, which could have caused a systematic positive bias for preterm infants (predicted GA > true GA). The relatively good performance of our model suggests that the impact of preterm birth on brain connectivity development is important and clearly apparent on the neonatal structural connectome.
We achieved high performance in the prediction of PMA at scan in our preterm cohort by averaging the predictions from all 5 term trained DNN models ( = 1.16; = 2.24 ; 2 = 0.42 ; 2 = 1.45; = 0.79; ≪ 0.001). Although there was high accuracy in prediction and correlation between true and predicted age for preterm infants, predictions were on average inferior to those obtained for PMA at scan for term infants ( Figure 5A). This is expected as preterm neonates are known to have specific differences in structural connectivity when compared with their term counterparts which may have reduced the generalizability of the predictive model (Ball et al., 2012;Batalle et al., 2017;Smyser et al., 2010). The difference between true and predicted PMA (brain maturation index) varies greatly across subjects, which we hypothesise is representative of the severity of the delay.

Limitations
There are several limitations to this work. Firstly, although the dHCP data set is the largest of its kind, a key next step will be to see whether the findings generalise to other populations as the regression algorithms were built and trained with the specific MRI acquisition protocol, brain parcellation and connectome generation methods developed for that project. To enable this, our predictive algorithms have been made publicly available, so other researchers can evaluate their performance in data sets with different acquisition and processing protocols. As the data was normalized prior to training, given a similar parcellation, we hypothesize that significant correlation between true and predicted GA at birth or PMA at scan should be obtained if tested on different data. There is increasing interest in predicting outcomes at age 18-24 months directly from neonatal brain connectivity, as done in (Girault et al., 2019). We have implemented our own version of their method and tested on the sub-set of our data set with available BSID-III at 18 months. However, no significant prediction capacity was reached with our data (data not shown). This might be due to different developmental outcome (Mullen scale instead of BSID-III), as well as differences in the pre-processing pipeline, or differences in the sample size and characteristics.
The recent progress in geometric deep learning, a new type of deep neural models specifically designed for data in non-Euclidean space (such as graphs), could be of great potential to improve the results of dense neural networks (Bronstein et al., 2017).
It remains difficult to deploy this type of study to clinical settings, as the highly nonlinear and multidimensional inner working of the algorithms are difficult to interpret by humans. Recent progress in deep learning explainability is of great potential to help on that matter (for a review, see (Xie et al., 2020)). In this context, identifying which specific edges influenced the decision of the network for a specific prediction could help clinicians to diagnose specific neurodevelopmental disorders and may have implications for targeted intervention.

Conclusion
In this work, we have used DNN to uncover important demographic and clinical information from the neonatal structural connectome, for the first time in a large sample of normative neonates. We achieved a MAE of 0.72 weeks in predicting PMA at scan, demonstrating that the neonatal structural connectome contains key developmental information. Furthermore, our prediction of GA at birth, with MAE of 2.21 weeks, shows that the patterns characteristic of prematurity are clearly present in the neonatal connectome, and can be uncovered with machine learning approaches. Finally, our brain maturation index computation on the preterm cohort was significantly correlated to BSID-III motor outcome at corrected age of 18 months. Brain maturation index thus appears to be a promising biomarker for prediction of neurodevelopmental disorders and delays, opening a potential path for early diagnosis and prevention of disorders in preterm born neonates.