## Abstract

Measurements from structural brain magnetic resonance imaging (MRI) scans have been increasingly analyzed as intermediate phenotypes to bridge the gap between clinical features and genetic variation. To date, most imaging phenotypes are scalar, such as the volume of a brain region, which can miss subtle or localized morphological variation associated with genetics or relevant to disease. Neuroanatomical shape measurements — multidimensional geometric descriptions of a brain structure — provide an alternate class of phenotypes that remain largely unexplored. In this paper, we extend the concept of heritability to multidimensional traits, and present the first comprehensive analysis of the heritability of neuroanatomical shape measurements across an ensemble of brain structures based on genome-wide single nucleotide polymorphism (SNP) and MRI data from 1,317 unrelated, young (18-35 years) and healthy individuals. Our results demonstrate that neuroanatomical shape can be significantly heritable, above and beyond volume, and thus can serve as a complementary phenotype to study the genetic underpinnings and clinical relevance of brain structure.

A broad range of psychiatric and neurological disorders, including schizophrenia, autism spectrum disorder, bipolar disorder and Alzheimer’s disease, are highly heritable [1, 2]. The exponential progress in genomic technologies has accelerated the examination of the complex genetic underpinnings of these diseases. For example, recent large-scale genome-wide association studies (GWAS) have provided insights about common genetic variants linked with various clinical conditions [3–6].

Brain imaging is playing an increasing role in the study of the relationship between genetic variants, neuroanatomy, behavior and disease susceptibility [7–10]. To date, most structural neuroimaging genetics studies have utilized volumetric phenotypes, such as the size, average cortical thickness, or surface area of a brain region, to yield important discoveries about the genetic basis of brain morphology [see e.g., 11–14]. While these measurements capture a few basic dimensions of anatomical variability, they provide a limited description of the underlying geometry.

Neuroanatomical shape measurements — multidimensional geometric descriptions of a brain structure — have attracted increasing attention in medical image analysis. Shape measurements characterize isometry-invariant (in particular, independent of location and orientation) geometric attributes of an object, which provide a rich description of an anatomical structure and can encompass volumetric variation. Such measurements may thus offer increased sensitivity and specificity in examining the clinical relevance and genetic underpinnings of brain structure. Recent studies have shown that the shape of subcortical brain regions and cortical folding patterns provide information not available in volumetric measurements that is predictive of disease status, onset and progression in schizophrenia [15–17], autism [18, 19], bipolar disorder [20, 21], Alzheimer’s disease [22–25], and other mental disorders [26, 27]. There is also increasing evidence that genetic variants may have influences on brain morphology that can be captured by shape measurements [28–32].

This paper makes two major contributions to the investigation of the genetic basis of neuroanatomical shape. First, we extend the theoretical concept of heritability to multidimensional traits, such as the shape descriptor of an object, and propose a novel method to estimate the heritability of multidimensional traits based on genome-wide single nucleotide polymorphism (SNP) data from unrelated individuals (known as SNP heritability). Our estimation method builds on genome-wide complex trait analysis (GCTA) [33, 34] and phenotype correlation-genetic correlation (PCGC) regression [35], and generalizes these techniques to the multivariate setting. Second, using structural MRI and SNP data from 1,317 unrelated individuals collected as part of the Harvard/Massachusetts General Hospital (MGH) Brain Genomics Superstruct Project (GSP) [36, 37], we present the first comprehensive heritability analysis of the shape of an ensemble of brain structures, quantified by the truncated Laplace-Beltrami Spectrum (LBS) (also known as the “Shape-DNA”) [38–40], in this young (18-35 years) and healthy cohort, and devise a strategy to visualize primary modes of shape variation.

The truncated LBS is a multidimensional shape descriptor, which can be obtained by solving an eigenvalue problem on the 2D boundary surface representation of an object. It is invariant to the representation of the object including parameterization, location and orientation, and thus does not require spatial alignment with a population template, making it computationally efficient and robust to registration errors. LBS also depends continuously on topology-preserving deformations, and is thus suitable to quantify differences between shapes. Recent empirical evidence suggests that the LBS-based shape descriptor provides a discriminative characterization of brain anatomy and offers state-of-the-art performance for a range of shape retrieval and segmentation applications [41, 42]. A collection of the descriptors of brain structures, known as *BrainPrint*, can provide an accurate and holistic representation of brain morphology, and has been successfully applied to subject identification, sex and age prediction, brain asymmetry analysis, twin studies, and computer-aided diagnosis of dementia [40, 43]. Our LBS-based heritability analyses demonstrate that neu-roanatomical shape can be significantly heritable, above and beyond volume, and yield a complementary phenotype that offers a unique perspective in studying the genetic underpinnings of brain structure.

## Results

**Heritability of the volume of neuroanatomical structures**. To benchmark our shape results, we first computed SNP heritability estimates for the volumetric measurements of an array of brain regions. Table 1 lists these heritability estimates after adjusting for intracranial volume (ICV or head size) as a covariate. Point estimates of the heritability of volumetric measurements suggested that several neuroanatomical structures have moderately heritable volumes. In particular, the caudate, corpus callosum, 3rd and 4th ventricles and putamen all had volume heritability estimates greater than 30%. Table 1 further includes *p*-values for the statistical significance of the heritability estimates. The parametric (Wald) and non-parametric (permutation-based) *p*-values were virtually identical, confirming the accuracy of the standard error estimates we computed (Online Methods). We observe that none of the volume heritability estimates were statistically significant after correcting for multiple comparisons (e.g., via false discovery rate or FDR at *q =* 0.05), likely due to sample size limitations. Only the volumes of the caudate, corpus callosum and 3rd ventricle achieved a heritability that was nominally significant in our sample (uncorrected *p* < 0.05). Table 1 also includes test-retest reliability estimates of volume after regressing out ICV, computed as Lin’s concordance correlation coefficient [44] using measurements from 42 subjects with repeated scans on separate days. Almost all structures had a volume estimate reliability greater than 0.75 except for the pallidum. There was no significant correlation between the reliability and heritability estimates of volume (*p* = 0.906).

**Heritability of the shape of neuroanatomical structures**. Neuroanatomical shape provides a geometric characterization and a rich description of a brain structure. We therefore hypothesize that analyzing the shape variation of neuroanatomical structures can identify genetic influences beyond captured by volumetric measurements. Figure 1 and Table 2 show the SNP heritability estimates of the shape of an ensemble of brain structures. These estimates were computed based on LBS descriptors normalized for size and explicitly including the volume of the corresponding structure as a covariate in the analysis to account for potential volume effects. A number of structures showed moderate to high SNP heritability. Specifically, the shape of the caudate, cerebellum, hippocampus, 3rd ventricle and putamen exhibited heritability estimates greater than 30%. All these estimates were statistically significant, after correcting for an FDR at *q =* 0.05. We observe that this is in contrast with the case of volume, where despite a similar heritability range, no estimate reached FDR-corrected significance due to sample size limitations. The main reason for this discrepancy is the theoretically guaranteed reduced standard errors in heritability estimates of multidimensional traits (see Online Methods for a theoretical treatment). The shape of the accumbens area and corpus callosum were also marginally significantly heritable with uncorrected *p*-values less than 0.05. As in the case of volume, the parametric (Wald) *p*-values were virtually identical to the permutation *p*-values, suggesting that our standard error estimates were accurate (Online Methods).

Table 2 also lists test-retest reliability estimates for the shape of different structures. Analogous to the case of volume, we quantified reliability as the average Lin’s concordance correlation coefficient of individual components of the multidimensional shape descriptor from 42 subjects with repeat scans on separate days. These results suggest that the LBS-based shape descriptors were overall less reliable than volumetric measurements, with half of the structures exhibiting a shape reliability less than 0.75. This is likely due to the increased sensitivity of shape to segmentation differences relative to the volume. Furthermore, there was a marginally significant correlation between reliability and heritability of shape (Pearson’s *r* = 0.562 and *p* = 0.057). We conclude that close to 30% of the variation in shape heritability across structures can be attributed to the reliability of the shape descriptor. This suggests that for structures that exhibited low shape heritability (e.g., amygdala), a more accurate image segmentation and shape analysis pipeline might yield an increased estimate of heritability. We further conducted a sensitivity analysis of shape heritability estimates, with respect to the two free parameters of the LBS-based shape descriptor: number of eigenvalues incorporated and amount of smoothing applied to the surface mesh representing the geometry of the object. Supplementary Figure S1 shows that the heritability estimates were largely robust to variations in these parameters.

**Visualizing the principal mode of shape variation**. The LBS-based shape descriptor is suitable to efficiently and accurately extract intrinsic properties of the shape of brain structures from a large number of individuals, but is not designed to visually inspect shape differences. Here, we devised a strategy to visualize the principal mode of shape variation. Specifically, it can be shown that the first principal component (PC) of the multidimensional LBS-based shape descriptor captures the greatest shape variation and has the largest impact on the overall heritability estimate of the shape (Online Methods). We thus visualized shape variation along the first PC of the shape descriptor for brain structures with significantly heritable shapes: right caudate, right hippocampus, left putamen, cerebellum and 3rd ventricle. The illustrations of contralateral structures (i.e., left caudate, left hippocampus and right putamen), which showed similar shape variation, are provided in Supplementary Figure S2. In each panel of Figure 2, the structure is represented with a sample-specific population average, on which average shapes at the two extremes (±2 standard deviation or SD) of the principal axis with identical volume (–2 SD, blue; +2 SD, red) are depicted. Blue regions indicate where shapes around the – 2 SD are larger than shapes around the +2 SD, and vice versa for the red regions.

The first PC of the right caudate captured 77% of the shape variation and had a heritability estimate of 0.82. Moving along the principal mode of shape variation, the right caudate had a shorter (longer) tail with a corresponding larger (smaller) head. For the right hippocampus, the first PC explained 69% of the shape variation, had a heritability of 0.47, and exhibited dorsoventral widening (narrowing) of the body and a corresponding lateral and anterior-posterior contraction (expansion). The first PC of the left putamen explained 61% of the shape variation, had a heritability of 0.61, and captured lateral widening (narrowing) and a corresponding contraction (expansion) along the dorsoventral and anteriorposterior axes. The first PC of the cerebellum explained 69% of the shape variation and had a heritability of 0.62. A clear expansion (contraction) of the anterior lobe and a corresponding contraction (expansion) of the posterior lobe can be observed along the principal axis. Finally, the first PC of the 3rd ventricle captured 69% of the shape variation and had a heritability estimate of 0.74. The principal mode captured enlargement (shrinking) of the anterior and posterior protrusions coupled with a corresponding contraction (expansion) of the lateral walls and the roof of the cleft.

## Discussion

This work makes two significant contributions to neuroscience and genetic research. First, we extended the concept of heritability to multidimensional traits and present an analytic strategy that generalizes SNP heritability analysis. The heritability estimator we proposed for multidimensional traits has reduced uncertainty in the point estimate relative to univariate estimates and thus offers more statistical power. Our empirical analyses confirmed this theoretical expectation. Moreover, we provided methods that can easily adjust for covariates in multivariate models, and also both parametric and nonparametric inferential tools that can measure the significance of a heritability estimate. Our approach opens the door to the genetic characterization of shape measurements and other multidimensional traits.

Second, we used the proposed approach to quantify the SNP heritability of the shape of an ensemble of brain structures. The shape of caudate, cerebellum, hippocampus, 3rd ventricle and putamen exhibited moderate to high heritability (i.e., greater than 30%), after controlling for volume. All of these estimates achieved FDR-corrected significance at *q* = 0.05. This is in contrast to the volume heritability estimates of the same set of brain structures on the same sample, none of which achieved FDR-corrected significance, likely due to sample size limitations. Our results represent the first comprehensive heritability analysis of the shape of anatomical structures spanning the human brain in a group of healthy subjects.

A handful of prior neuroimaging studies have explored the shape of certain brain structures as potential phenotypes in examining genetic associations. For example, Qiu et al. [28] and Shi et al. [29] reported influences of the apolipoprotein E (APOE) *ε*4 allele on hippocampal morphology in depressive and Alzheimer’s disease patients. Variants involved in the regulation of the *FKBP5* gene were recently associated with hippocampal shape [30]. A meta-study [32] identified a GWAS significant SNP that exerts its effect on the shape of putamen bilaterally. Prior studies have also estimated heritability of shape based on familial relatedness. In a recent study, the heritability of the shape of subcortical and limbic structures was estimated using data from multigenerational families with schizophrenia [31]. In other related work, Mamah et al. [46] and Harms et al. [47] revealed shape abnormalities in basal ganglia structures (caudate, putamen and globus pallidus) and the thalamus in siblings of schizophrenia patients. An application of the LBS-based shape descriptor to twin data found increased shape similarity of brain structures in monozygotic twin pairs relative to dizygotic twins, indicating genetic influences on brain morphology [40], although heritability was not estimated.

However, to date, outside of these notable exceptions, most structural imaging genetics studies have utilized scalar measurements (e.g., volume, thickness, area) as phenotypes. In the present study, we accounted for potential volume effects in our shape analyses by normalizing the LBS-based shape descriptor for size and additionally including the volumetric measurement of the corresponding structure as a covariate when estimating heritability. Our results showed that shape measurements provide a rich and novel set of phenotypes for exploring the genetic basis of brain structure, and may identify novel genetic influences on the brain that are not detectable with conventional analyses based on the volume of structures.

There are several biological mechanisms that might lead to shape differences with minimal effect on the overall size of the structure. These include localized volumetric effects that are confined to subfields, sub-nuclei or other sub-regions that make up the structure. Shape analysis may further provide significant information about neurodevelopmental abnormalities, such as those associated with neuronal migration, synaptogenesis, synaptic pruning and myelination. Shape measurements might for example shed light on morphogenetic mechanisms that involve mechanical tensions along axons, dendrites and glial processes [48]. Thus, shape measurements are particularly promising phenotypes for studying neurodevelopmental disorders. Neurodegenerative processes and other pathologies, many of which are known to be genetically influenced, can also impact neuroanatomical shape by exerting focal and/or selective insults. For example, in Alzheimer’s disease, morphological alterations in the hippocampus may only target certain subfields [49].

The shape analysis literature offers an expanding list of methods to quantify and characterize shape [42]. A major advantage of the LBS-based shape descriptor [38] employed in this study is that it is robust to intensity variation across scans and does not require the nonlinear spatial registration of the object with a population template, which can be computationally demanding and prone to error. In this paper, we also presented a novel strategy to visualize the principal mode of shape variation across the population. For brain structures with significantly heritable shapes, we demonstrated that the principal mode explains a large portion of the overall shape variation and is often highly heritable. This approach can thus shed light on the global genetic influences on brain structures, and is complementary to studies that rely on nonlinear group-wise registration to characterize localized genetic influences on shape variation.

The heritability analysis of multidimensional traits developed here can be applied to phenotypes other than shape that are intrinsically multivariate. Another application might involve heritability or genetic association analyses combining related traits to obtain more stable effect estimates. For example, it can be used as an alternative to principal component analysis (PCA) and factor analysis when investigating the genetic basis of various psychometric or behavioral traits. Also, voxel- or vertex-level neuroimaging measurements are often noisy, and analyzing these measurements in homogeneous brain regions in a multivariate fashion may increase the reliability and reproducibility of the results. Finally, the genetic similarity matrix can be computed with other SNP grouping strategies (e.g., based on genes, pathways, functional annotations and previous GWAS findings) to model the genetic influences from a specific genomic region or partition the heritability of multidimensional traits, as in Yang et al. [50].

## Online Methods

**Heritability of multidimensional traits**. We start with a brief review of genome-wide complex trait analysis (GCTA) [33, 34], which makes it possible to estimate the heritability of univariate traits due to common genetic variants using genome-wide SNP data from unrelated individuals. Assuming, for the moment, no covariate needs to be adjusted, GCTA follows a linear random effects model:
where ** y** = [

*y*,⋯,

_{i}*y*]

_{N}^{⊤}is an

*N ×*1 vector comprising quantitative traits from

*N*individuals,

**is the**

*W**N*×

*L*mean-centered and standardized genotype matrix,

**= [**

*β**β*

_{1},⋯,

*β*]

_{L}^{⊤}is a vector of genetic effect sizes, which are independent across genetic loci and have the same variance , ∊ = [∊

_{1},⋯, ∊

*]*

_{N}^{⊤}is an environmental factor independent across individuals with homogeneous variance ,

**is an identity matrix. The covariance structure of**

*I***can be computed as where is the genetic variance captured by the**

*y**L*common SNPs spanning the genome, and

**K**:=

**/**

*WW*^{⊤}*L*is the empirical genetic similarity matrix for each pair of individuals estimated from genome-wide SNP data

**. The SNP heritability of a trait is defined as ), which can be estimated by maximizing the likelihood of model (1). measures the proportion of phenotypic variance that can be explained by aggregated additive effects of genetic variants in the genome. The subscript**

*W**g*indicates that the SNP heritability only captures the genetic variation tagged by the common variants in the data set, and is thus a lower bound for the classical narrow-sense heritability.

We now consider an *M*-dimensional trait ** Y** = [

*y*_{1},⋯,

*y**] = [*

_{M}*y*]

_{ij}_{N ×}

*. We model*

_{M}**by a multivariate linear random effects model: where**

*Y***, as above, is the**

*W**N*×

*L*mean-centered and standardized genotype matrix,

**= [**

*B*

*b**,⋯,*

_{1}

*b**] is an*

_{M}*L*×

*M*matrix containing the effect size of each SNP for each trait, and

**= [**

*E*

*e**,⋯,*

_{i}

*e**] is an*

_{M}*N*×

*M*matrix of environmental factors. We have the following distributional assumptions: where

**vec**(·) is the matrix vectorization operator which converts a matrix into a vector by stacking its columns, ⊗ is the Kronecker product of matrices,

**Ψ**

*is the*

_{B}*M*×

*M*covariance matrix of the columns of

**, and is the residual covariance matrix. The two assumptions indicate that the genetic effect sizes are independent across loci and the environmental factors are independent across individuals, but both of them can be correlated across trait dimensions. When the trait is a scalar, model (3) degenerates to the classical GCTA model (1).**

*B*Using the properties of vectorization and the Kronecker product, the covariance structure of ** Y** can be computed as follows:
where in the last equality we have defined the genetic covariance matrix and the empirical genetic similarity matrix

**=**

*K*

*WW*^{⊤}/

*L*. We note that equation (5) decomposes the covariance structure of

**into the part that can be explained by genetics, Σ**

*Y**⊗*

_{G}**, and the residuals Σ**

*K**⊗*

_{E}

*I*

_{N}_{×}

*. We therefore define the SNP heritability of a multidimensional trait as where Σ*

_{N}*is the*

_{P}*M*×

*M*covariance matrix of the columns of

**,**

*Y***tr**[·] is the trace operator of a matrix. The derivation in equation (6) is based on the fact that, for standardized genotype, we have

**tr**[

**] =**

*K***[**

*tr*

*I**] =*

_{N×N}*N*. This definition computes the proportion of the total phenotypic variance

**tr**[Σ

*] that can be explained by the total genetic variance*

_{P}**tr[**Σ

*], and yields a heritability measure that is bounded between 0 and 1. When the trait is univariate, both Σ*

_{G}*and Σ*

_{G}*become scalars, and equation (6) reduces to the classical definition of SNP heritability.*

_{Ε}Our definition of heritability is invariant to rotations of the data. For a linear transformation ** T** applied to model (3), i.e.,
the transformed heritability is

When ** T** is an orthogonal matrix satisfying

*TT*^{⊤}=

*T*^{⊤}

**=**

*T*

*I**, we have .*

_{M×M}**An empirical estimator**. Model (3) can in principle be fitted using likelihood-based methods to obtain estimates of the genetic and residual covariance matrices Σ* _{G}* and Σ

*. However, this can be computationally expensive and numerically unstable when the dimension of the trait is moderate and sample size is limited. Here we derive an alternative moment-matching estimator of Σ*

_{Ε}*. Specifically, the covariance structure in equation (5) gives the following relationship:*

_{G}Therefore, an unbiased estimate of can be obtained by regressing the off-diagonal terms of the cross-product of the mean-centered traits , which is an empirical estimate of the phenotypic covariance matrix of *y** _{r}* and

*y**, onto the off-diagonal terms of the genetic similarity matrix*

_{s}**. This estimator is known as Haseman-Elston regression for the classical heritability analysis [51, 52], and has been extended recently to handle various study designs including case-control studies, and more generally termed as phenotype correlation-genetic correlation (PCGC) regression [35]. We explicitly write the estimator of as follows: where**

*K*

*K*_{d}is the genetic similarity matrix

**with all diagonal elements set to zero,**

*K***is a centering matrix with the**

*H**ij*-th entry

*H**=*

_{ij}*δ*·− 1/

_{ij}*N*,

*δ*· is the Kronecker delta, ○ is the Hadamard (element-wise) product of matrices, and we have defined . Therefore, it can be seen that an estimator of the genetic covariance matrix Σ

_{ij}*is*

_{G}We empirically estimate the phenotypic covariance matrix as and thus from equation (6) we have

For scalar traits, equation (13) degenerates to the classical Haseman-Elston regression estimator.

**Interpretation of multidimensional heritability**. The estimator of the total phenotypic variance can be more explicitly expressed as
where is the estimated variance of the *m*-th component of the multidimensional trait. Moreover, we have
where with , and is the Haseman-Elston heritability estimate for the *m*-th component. Therefore, our definition of the SNP heritability of multidimensional traits is essentially a weighted average of the heritability of individual components.

The estimator (13) can also be rewritten as
where ** L** :=

*YY*^{⊤}is a linear kernel matrix that quantifies the phenotypic similarity between pairs of individuals. The last equality of equation (16) indicates that the estimator (13) can also be viewed as regressing the off-diagonal terms of the centered phenotypic similarity matrix

**onto the corresponding off-diagonal terms of the genetic similarity matrix**

*HLH***, normalized by the total phenotypic variance computed under the specific similarity metric. This opens the possibility of generalizing the definition of heritability to generic metric spaces using kernel tricks.**

*K***Statistical inference**. We now derive the sampling variance of . As pointed out above, the estimator (13) can be formulated under a regression framework. We follow the ideas of Visscher et al. [45] and make two assumptions about this regression: (1) The variance of *k _{ij}* is small and explains little phenotypic variation such that the variance of the residuals is approximately equal to the variance of the off-diagonal terms of

**; and (2) the total phenotypic variance can be estimated with very high precision. Under these assumptions, we have where in the last but one approximation**

*HLH***tr**[(

**) ○ (**

*HLH***)] is a low order term relative to**

*HLH***tr**[

**]**

*HLH*^{2}and was dropped. We note that and thus the estimator (17) only depends on the sample size and the phenotypic covariance.

For scalar traits, , and the estimator (17) degenerates to , which coincides with existing results in the literature [45]. In general, the sampling covariance matrix is non-negative definite, where denote its eigenvalues. Thus we have

This inequality becomes an equality if and only if , i.e., the *M* traits are all perfectly correlated. Therefore, combining multiple traits reduces the variability of heritability estimates relative to analyzing each trait individually.

To measure the significance of a heritability estimate, a *p*-value can be computed by conducting a Wald test. Since the null hypothesis, , lies on the boundary of the parameter space, the Wald test statistic is distributed as
a half-half mixture of , a chi-squared distribution with all probability mass at zero, and , a chi-squared distribution with 1 degrees of freedom [53].

Alternatively, permutation inference can be used by shuffling the rows and columns of the genetic similarity matrix ** K**. For each permutation

*r*= 1,⋯,

*N*

_{perm}, we record the heritability estimate computed from the permuted data. Then for an observed heritability estimate , the permutation

*p*-value can be computed as

**Modeling covariates**. When covariates or nuisance variables need to be adjusted, equation (3) becomes a multivariate linear mixed effects model:

where ** X** is an

*N*×

*q*matrix of covariates, and

**is a**

*α**q*×

*M*matrix of fixed effects. We employ a strategy described in Ge et al. [54] to remove the effects of covariates and make the permutation procedure remain accurate. Specifically, the method computes an

*N*× (

*N*−

*q*) matrix

**, satisfying**

*U*

*U*^{⊤}

**=**

*U*

*I*_{(}

_{N}_{−}

_{q}_{)×(}

_{N}_{−}

_{q}_{)},

*UU*^{⊤}=

*P*_{0}, and

*U*^{T}

**=**

*X***0**, where

*P*_{0}=

*I*_{N×N}−

**(**

*X*

*X*^{⊤}

**)**

*X*^{−1}

*X*^{⊤}. The matrix

*U*^{⊤}projects the data from

*N*dimensional space onto an

*N*−

*q*dimensional subspace: where . We note that

**tr**[

*U*^{⊤}

**] =**

*KU***tr**[

*KP*_{0}] ≈

*N*−

*q*for unrelated individuals with small genetic similarity. The transformed model is the same as model (3) and thus all estimation and inferential methods developed above can be applied.

**The Brain Genomics Superstruct Project (GSP)**. The Harvard/Massachusetts General Hospital (MGH) Brain Genomics Superstruct Project (GSP) is a neuroimaging and genetics study of brain and behavioral phenotypes. More than 3,500 native English-speaking adults with normal or corrected-to-normal vision were recruited from Harvard University, MGH, and the surrounding Boston communities. To avoid spurious effects resulting from population stratification, we restricted our analyses to 1,317 young adults (18-35 years old) of non-Hispanic European ancestry with no history of psychiatric illnesses or major health problems (age, 21.54±3.19 years old; female, 53.15%; right-handedness, 91.72%). All participants provided written informed consent in accordance with guidelines set by the Partners Health Care Institutional Review Board or the Harvard University Committee on the Use of Human Subjects in Research. For further details about the recruitment process and participants, and imaging data acquisition, we refer the reader to Holmes et al. [36, 37].

**Genetic analysis**. We used PLINK 1.90 (https://www.cog-genomics.org/plink2) [55], to preprocess the GSP genome-wide SNP data. Major procedures included sex discrepancy check, removing population outliers, spuriously related subjects and subjects with low genotype call rate (< 97%). Individual markers that contained an ambiguous strand assignment and that did not satisfy the following quality control criteria were excluded from the analyses: genotype call rate , minor allele frequency (MAF) , and Hardy-Weinberg equilibrium . 574,620 SNPs remained for analysis after quality control. We performed a multidimensional scaling (MDS) analysis to ensure that no clear population stratification and outliers exist in the sample (Supplementary Figure S3). The genetic similarity matrix was estimated from all genotyped autosomal SNPs.

**Laplace-Beltrami Spectrum based shape descriptor**. The intrinsic geometry of any 2D or 3D manifold can be characterized by its Laplace-Beltrami Spectrum (LBS) [38, 39], which is obtained by solving the following Laplacian eigenvalue problem (or Helmoltz equation):
where Δ is the Laplace-Beltrami operator, a generalization of the Laplacian in the Euclidean space to manifolds, *f* is a real-valued eigenfunction defined on a Riemannian manifold, and λ is the corresponding eigenvalue. Equation (23) can be solved by the finite element method, yielding a diverging sequence of eigenvalues . An implementation of the algorithm is freely available (http://reuter.mit.edu/software/shapedna). The first *M* eigenvalues of the LBS can be used to define a description of the object, which provides a numerical fingerprint or signature of the shape, and is thus known as (length-M) “Shape-DNA”.

**Shape analysis pipeline**. We used FreeSurfer (http://freesurfer.net) [56], version 4.5.0, a freely available, widely used, and extensively validated brain MRI analysis software package, to process the structural brain MRI scans and label subcortical brain structures. Surface meshes of brain structures were obtained via marching cubes from FreeSurfer’s subcortical segmentations. We created triangular meshes on the boundary surfaces for 20 structures. We then geometrically smoothed these meshes and solved the eigenvalue problems of the 2D Laplace-Beltrami operator on each of these representations, yielding the LBS-based shape descriptor for these structures [40]. A python implementation of this pipeline is freely available (http://reuter.mit.edu/software/brainprint).

**Heritability analyses of neuroanatomical shape**. We treated the length-M LBS-based shape descriptor of each structure as a multidimensional trait and quantified its heritability. In the case of a closed manifold without a boundary, the first eigenvalue is always zero and was thus removed from analysis. Theoretical and empirical evidence have confirmed that the eigenvalues grow linearly and their variance grows quadratically [38, 40]. To avoid that higher eigenvalues dominate the phenotypic similarity measure, we re-weighted the m-th eigenvalue for the i-th subject as [38]:

This ensures a balanced contribution of lower and higher eigenvalues on the similarity measure. The LBS also depends on the overall size of the structure. To measure the genetic influences on the shape that are complementary to volume, we further scaled the eigenvalues as:
where *V _{i}* is the volume of the structure for the

*i*-th subject. Since scaling the eigenvalues by a factor

*η*results in scaling the underlying manifold by a factor

*η*

^{−1/2}[38], the normalization (25) ensures that the volumes of the structure are identical across individuals.

We combined the same structure for the left and right hemisphere, and computed the phenotypic similarity matrix using the re-weighted and scaled eigenvalues in the multivariate heritability analyses. We included age, gender, handedness, scanner group, console group, and the top five principal components of the genetic similarity matrix as covariates. To remove potential size effect, we also explicitly including the volume of the corresponding structure as a covariate in the analysis.

The number of eigenvalues incorporated in the LBS-based shape descriptor and the amount of smoothing applied to the surface mesh are crucial study designs, which might have an impact on heritability estimates. In particular, incorporating a very small number of eigenvalues may be insufficient to characterize the shape of a structure, while very large eigenvalues typically capture fine-scale details, which can be noise and thus might reduce sensitivity. In this study, we reported results obtained by incorporating 50 eigenvalues in the shape descriptor and applying 3 iterations of geometric smoothing to the surface mesh. We conducted sensitivity analyses and confirmed that in the present shape analysis the results were largely robust to different parameter settings (Supplementary Figure S1).

**Visualizing the principal mode of shape variation**. We note that, as shown above, our definition of the heritability of multidimensional traits is a variance-weighted average of individual components, and is invariant to the rotation of the trait vector. Therefore, an equivalent definition of the heritability of a length-*M* LBS-based shape descriptor is the variance-weighted average of the heritability of the first *M* principal components (PCs) of the descriptor, because principal component analysis (PCA) is essentially a rotation of the data. The first PC thus explains the greatest shape variation and has the largest impact on the overall heritability estimate of the shape.

To visualize shape variation along the first PC of the shape descriptor for a given structure, we first aligned the structures from all subjects to a template, *fsaverage*, which is a population average distributed with FreeSurfer [56], using a 7-parameter (global scaling plus 6-parameter rigid body transformation) registration with linear interpolation. Both individual structures and the template were represented with binary label maps, where voxels within the corresponding segmentation label had one and the remainder of the volume had zero values. The registration algorithm maximized the overlap, measured with the Dice score [57], between the corresponding label maps (the fixed template and moving subject which was interpolated and thresholded at 0.5). Note that LBS is invariant to the spatial position and orientation of an object, and we had normalized the shape descriptor for volume in all the analyses. Thus this registration has no impact on the results of our heritability analyses. We then created a sample-specific population average of the structure by computing a weighted average of the interpolated subject images. In particular, each subject was associated with a weight equal to a Gaussian kernel centered around the mean of the first PC and evaluated at the subject’s first PC of the shape descriptor. The width of the kernel was selected such that 500 shapes received non-zero weights. The isosurface of the resulting probability map at 0.5 was used to represent the average shape of the structure, and all visualizations were presented on this surface.

The same Gaussian kernel was used to generate average probability images for shapes centered at the two extremes (±2 standard deviation or SD) of the principal axis. These average probability images were offset to achieve identical volumes when thresholded at 0.5. The difference of the two extreme shapes were depicted on the sample-specific population average, by visualizing the difference in the probability values. Blue indicated that the average shape at −2 SD achieved a higher probability value and thus was larger in those regions than the average shape at the +2 SD. For red regions, the opposite was true.

## Acknowledgements

This research was carried out in whole at the Athinoula A. Martinos Center for Biomedical Imaging at the Massachusetts General Hospital (MGH), using resources provided by the Center for Functional Neuroimaging Technologies, P41EB015896, a P41 Biotechnology Resource Grant supported by the National Institute of Biomedical Imaging and Bioengineering (NIBIB), National Institutes of Health (NIH). This work also involved the use of instrumentation supported by the NIH Shared Instrumentation Grant Program; specifically, grant numbers S10RR023043 and S10RR023401.

Data were provided by the Brain Genomics Superstruct Project (GSP) of Harvard University and MGH, with support from the Center for Brain Science Neuroinformatics Research Group, Athinoula A. Martinos Center for Biomedical Imaging, Center for Human Genetic Research, and Stanley Center for Psychiatric Research. Twenty individual investigators at Harvard and MGH generously contributed data to the overall project.

This research was also funded in part by NIH grants K25CA181632 (to MR); K01MH099232 (to AJH); K99MH101367 (to PHL); R01NS083534, R01NS070963, and 1K25EB013649-01 (to MRS); K24MH094614 and R01MH101486 (to JWS); an MGH ECOR Tosteson Postdoctoral Fellowship Award (to TG); the Brazilian National Research Council (CNPq), grant number 211534/2013-7 (to AMW); and a BrightFocus Foundation grant AHAF-A2012333 (to MRS). JWS is a Tepper Family MGH Research Scholar.

## Footnotes

Emails: tge1{at}mgh.harvard.edu; msabuncu{at}nmr.mgh.harvard.edu

## References

- [1].↵
- [2].↵
- [3].↵
- [4].
- [5].
- [6].↵
- [7].↵
- [8].
- [9].
- [10].↵
- [11].↵
- [12].
- [13].
- [14].↵
- [15].↵
- [16].
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].
- [24].
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵