A landmark-free morphometrics pipeline: Application to characterise a Down syndrome model skull phenotype

Characterising phenotypes often requires quantification of anatomical shapes. Quantitative shape comparison (morphometrics) traditionally uses anatomical landmarks and is therefore limited by the number of landmarks and operator error when landmarks are located manually. Here we create a landmark-free method and use it to characterise the craniofacial skeletal phenotype of the Dp1Tyb mouse model of Down syndrome (DS), comparing it with a landmark-based approach. Both methods identify cranial dysmorphologies in Dp1Tyb mice, especially smaller size and brachycephaly (front-back shortening) homologous to the human phenotype. However, the landmark-free method provides better separation by principal component analysis, identifies differences where landmarks are absent, and better highlights reductions in mid-face structures that are reduced in humans with DS. Moreover, the landmark-free method is less labour-intensive and requires less user training. We conclude that this is a useful method making morphometrics widely-accessible beyond traditional niches in zoology and palaeontology, especially in characterising mutant phenotypes.


Introduction
Morphometrics, the quantitative comparison of biological shapes, is well established in the fields of palaeontology and evolutionary biology 1 and is emerging in genetic studies to quantify and understand morphological phenotypes. Current morphometric methodologies, while powerful, have limitations that reflect a compromise between precision and ease-of-use.
Modern morphometrics, otherwise known as geometric morphometrics, uses anatomical landmarks to describe morphological structures and uses computational statistical methods to compare the landmark locations. Landmarks can be histological boundaries (e.g. the intersection between two skull sutures), geometric (e.g. the tip of a bone or the bottom of a sulcus), or relative (e.g. the point on a curved histological boundary furthest away from another landmark). Typically, some tens of landmarks are placed, usually manually, on digital two-or three-dimensional tomographic images (obtained by photography, X-ray or MRI methods) and are then analysed through Euclidean distance matrix analysis (EDMA) or Procrustes Superimposition (PS) 2 .
Landmark-based methodologies have limitations. Firstly, they are not easy to apply. There are occasions where an anatomical landmark may be absent from an individual due to natural variation, engineered mutation or pathology. At best this is a nuisance and at worst a landmark of biological relevance is lost, potentially affecting the analysis and interpretation of the data.
A second limitation is that smooth surfaces do not have easily defined landmarks. Semilandmarks, which are points evenly spaced along an outline or curve between two landmarks 3 can reduce this problem to a degree, and a number of computational advances have enabled them to be determined automatically 4,5 . Unfortunately, semi-landmark contours may still leave gaps, potentially missing variation in doming or depression in the triangular surfaces between nearest-neighbour landmark trios 6 . This is a particular problem in soft tissues and embryos, with numerous essentially featureless, smoothly curved surfaces. A third limitation of current landmark-based methods is operator error. Landmark sets for early embryonic mouse head Toussaint et al 4 anatomy are available but were shown to be highly susceptible to both inter-and intra-operator error 7 , not just in embryos but also more widely 8 . A study by Shearer et al 9 found that the variance in landmarking Tibetan macaque skulls was equal to or larger than the natural variability found between subjects. In summary, the combined problems of landmark variability, landmark sparseness and operator labour and reliability must be overcome if morphometrics is to be applied widely, especially as a tool to understand the genetics of dysmorphologies.
One of the most common human dysmorphologies is the craniofacial phenotype associated with Down syndrome (DS). Individuals with DS, currently ~1 in 800 births 11 , have characteristic features -flattened midface with low nose bridge, front-to-back shortened skull (brachycephaly) and slightly hooded eyelids 10 . Although the craniofacial features affect everyone with DS, this phenotype is not well understood either genetically or developmentally.
DS is caused by trisomy of human chromosome 21 (Hsa21) which carries 232 protein-coding genes (Ensembl genome assembly GRCh38) 11,12 . It is thought that the presence of a third copy of one or more of these genes gives rise to the individual defects observed in DS, but the critical dosage-sensitive genes are not known [13][14][15] .
Hsa21 is orthologous to three regions of the mouse genome, located on mouse chromosome 16 (Mmu16), Mmu10 and Mmu17; the largest region on Mmu16 contains 148 protein-coding genes. To model DS, mouse strains have been engineered that carry an extra copy of each of these regions, and between them they recapitulate many aspects of DS 14,16,17 .
Morphometrics applied to some of these models, including Ts65Dn 18-20 and Dp(16)1Yey mice 21 showed that trisomy of part or all of the Hsa21-orthologous region of Mmu16 resulted in craniofacial dysmorphology which resembled the DS phenotype. The cranial dysmorphology in Dp1Yey mice was highly consistent (with multiple linear distances between landmarks differing statistically significantly from wild type in all regions measured 21 ). Yet the dysmorphic Toussaint et al 5 phenotype was remarkably subtle, with an average landmark-to-landmark distance difference of only 7% between mutant and wild-type (WT) control mice 21 .
Landmark-free methods have been developed by the neuroimaging community to quantify the size and shape of the brain precisely because its relatively smooth shape hampers the definition of reliable landmarks 22,23 . However, these methods have not been directly compared to the landmark-based approach. In this paper, we describe a convenient pipeline for landmark-free morphometric analysis based on an approach used for brain imaging 24 . We compare our method to the traditional landmark-based morphometric approach, focusing on the characterisation of the craniofacial phenotype of the Dp1Tyb mouse model of DS which has an additional copy of the entire Hsa21-orthologous region of Mmu16 (ref 14 ). We find that the landmark-free analysis gives separation by shape between Dp1Tyb and WT mice that is at least as clear as that achieved by landmark-based analysis, while delivering a number of operational advantages and identifying phenotypic changes not seen in landmark-based analysis. Additionally, we show that the landmark-free method provides a rich output that can be used for higher-resolution statistical and spatial analysis, providing an approach to interpret shape modifications between groups as local tissue effects.

Landmark-based and landmark-free analysis of Dp1Tyb skulls
We used micro-computed tomography (µCT) to acquire images of the skulls of 16-week old WT and Dp1Tyb mice with which to compare the standard landmark-based analysis with our novel landmark-free analysis method. We carried out landmark-based analysis in the conventional way 25 , marking the location of 68 landmarks on the cranium and 17 on the mandible ( Supplementary Fig. 1). Crania and mandibles were analysed separately since their relative position varies from subject to subject. Landmarks for all crania and mandibles were Toussaint et al 6 aligned using Procrustes Superimposition, and these data were used for further statistical analysis of size and shape.
For the landmark-free approach we developed a pipeline based on previous approaches in morphometrics and neuroimaging (Fig. 1, Table 1, Supplementary Appendix 1). Following thresholding to extract the skull structures from the µCT images, cartilaginous structures were removed ( Supplementary Fig. 2) and the images segmented using bone density to separate the mandibles from the crania (step 1). Meshes were generated from the surfaces of the cranium and mandible for all subjects (step 2), aligned using Procrustes Superimposition for scaling (step 3) and these were used for the construction of an atlas (mean shape) for the crania and mandibles of the WT and Dp1Tyb skulls (step 4). Atlas construction was based on the Deformetrica algorithm 24 which, in brief, consists of construction an average mesh of the population, using the mesh to define a flow field that conforms to its shape and quantifies deformations from it to each subject recorded as momentum vectors (momenta -see Methods and Supplementary Appendix 1). The initial output from this atlas consisted of the average mesh for the whole population, a set of control points corresponding to areas with the greatest variability between subjects, and momenta for each control point describing the directional variation of the shape from the average. The average mesh, the control points and the momenta were used for further statistical analysis, with the momenta applied to deform the population average mesh to generate average meshes for each of WT and Dp1Tyb groups preserving one-to-one correspondence of mesh vertices. We performed principal component analyses and used a multiple permutations test on a stratified k fold cross validation classifier to test for significance. To control for overfitting (a risk when the number of measurements substantially exceeds the number of subjects), we compared the PCA difference vector magnitude between the two genotype groups with that of 1000 randomly scrambled groups.
We found that the distribution was Normal and that the genotype difference vector was more than 3.5 standard deviations away from the mean vector of the 1000 scrambled groups for both cranium and mandible. 7

Dp1Tyb mice have smaller crania and mandibles
Centroid size (the mean absolute landmark distance from the landmark-defined centroid) is the most commonly used measure of size in landmark-based geometric morphometrics 26 . We used it here to determine if there were any changes in skull and mandible size between the DS model and controls. Landmark-based comparison of centroid sizes by genotype found that the crania and mandibles of Dp1Tyb mice were both significantly smaller than those of WT mice ( Fig. 2A, C), recapitulating the overall reduction in skull size found in humans with DS and as well as in other models of DS [18][19][20][21]27 . The landmark-free analysis also showed that  Table). The landmark-free and landmark-based centroid sizes were different, unsurprisingly given the many extra measurements use in the landmark-free method (~19,000 mesh vertices versus 68 landmarks for cranium and ~16,000 vertices versus 17 landmarks for mandible).

Dp1Tyb mice have altered shapes of crania and mandibles
Both the size difference and gross shape differences were clearly visualised by animated morphing between the mean shapes of WT and Dp1Tyb specimens (generated in the landmark-free pipeline) for the cranium and the mandible (Supplementary Videos 1, 2). The overall decrease in size going from WT to Dp1Tyb crania or mandibles was readily apparent and some shape changes could also be seen, although the latter were more subtle.
To quantify shape differences, these were separated from the size differences by using Procrustes alignment, i.e. scaling the data to equalise centroid sizes. To discover residual shape differences between genotypes, we used principal component analysis (PCA) and applied well established significance tests as described in Methods. Both landmark-based and landmark-free methods showed a statistically significant differences in shape between Toussaint et al 8 Dp1Tyb and WT mice in both crania and mandibles ( Fig. 2E-H). Plots of the first two Principal Components identified by the two different methods looked similar, with tighter clustering of specimens for cranium than for mandible.

Dp1Tyb mice recapitulate aspects of human DS craniofacial dysmorphology
To characterise the shape differences in more anatomical detail and to be able to compare the differences in shapes found through the landmark-based method, we used two approaches. First, we simply overlaid the mean landmark configurations from Dp1Tyb and WT crania and mandibles ( The reduced snout and facial widening in combination with the overall smaller size of the Dp1Tyb crania mimics the "mid-face hypoplasia" of human DS. The Dp1Tyb mandibles had a Toussaint et al 9 small shape change in the alveolar ramus region and the condylar process but these changes were all subtle (less than 5 µm in magnitude) (Fig. 3K, L).
Next we made heatmaps and morphing movies based on the higher-resolution landmark-free method. Displacement maps together with the morphing movies visualised the distances between the two mean meshes. We plotted net displacement rather than movement towards or away from the shape centroid so that only one colour was needed in the maps, using videos to show the direction of differences. The landmark-free analysis showed changes mostly similar to those found using the landmark-based method including the same doming of the neurocranium, slight contraction of the posterior basicranium and shortening of the nasal and maxillary processes (

The landmark-free method provides new shape change information: in-plane deformation
Mesh displacement heatmaps primarily represent shape changes that are normal (perpendicular) to the surface of the specimen. However, shape differences arise in development due to different localised growth which is often in the plane, e.g. in planar skull bones (calvaria), and potentially remote from the location of displacement, e.g. proliferation in the growth plates of long bones displacing their tips. Mechanistically it would therefore be useful to capture surface "stretch" as a measure of local growth differences between specimens. This becomes feasible in a landmark-free method where a high density of control points is used to guide an even higher density of mesh vertices. Thus, we calculated and mapped local differences in mesh vertex spacing ( There is also contraction in the underside of the auditory bulla (Fig. 5C,   arrowheads). In the mandible the main change, a ~10% contraction in the ramus region just anterior of the angular process, corresponds closely to the displacement map, but in addition, the stretch mapping showed a very slight expansion in the diastema (gap between molars and incisor) (Fig. 5E, F). Toussaint et al 11

Discussion
In this paper, we have presented an adaptation and incorporation of the "currents"-based landmark-free shape comparison methodology 24 into a pipeline that can process µCT images into a form (i.e meshes) suitable for this method, and easily compare shapes and provide statistical and other data analyses comparable to more traditional geometric morphometrics.
We have used this pipeline to analyse the previously unexamined craniofacial phenotype of Dp1Tyb mice, a relatively new model of DS 14 . We compared the results with a well-established landmark-based method that is commonly used for skeletal shape analysis, particularly in the comparative zoology and paleontology fields 1 . The newer method yielded new biological insights and offers a great deal of promise for use in phenotyping, including for genetic models.
We validated the landmark-free approach through comparison to the classical landmarkbased geometric morphometric approach and found similar results; both revealed the novel finding that Dp1Tyb mice present highly consistent size and shape differences when compared to WT mice and these differences parallel those of the craniofacial phenotype observed in human individuals with DS 27-29 and in previously described DS mouse models [18][19][20][21] . Both methods also allowed the Dp1Tyb phenotype to be well separated in feature space from WT mice using one or two principal components and both revealed the significantly reduced size of the cranium and mandible of Dp1Tyb mice. We found that the DS model mice show brachycephaly, resolved in the analysis to an overall size reduction plus cranial doming.
We also found that the landmark-free approach can provide new dense information about the shape changes in Dp1Tyb mice, allowing us to see changes not visible using the landmarkbased approach. Most strikingly, we were able to observe a shape difference in the lowerposterior mandible, where landmarks are absent, and in the snout, cranial base and palate, where landmarks are more abundant but possibly not dense enough to capture the localised in-plane differences. These latter changes in particular indicate homology with the mid-face hypoplasia found in humans with DS.

12
Methodologically, the landmark-free method has three clear advantages. Firstly, the landmarkfree method overcomes the limitations of manual placement of landmarks which is susceptible to inter-and intra-operator error 30 . Secondly, although the landmark-free method requires some manual input in the early stages, particularly in determining image thresholding and in cleaning up imperfect anatomical segmentation, it is substantially less labour-intensive and requires less user training than the landmark-based approach. Thirdly, and perhaps most scientifically significant, the landmark-free method provides much higher resolution and information density than the landmark-based method. In principle, the landmark-free method offers arbitrarily high resolution. In practice we found that decimating the initial mesh from ~2,800,000 and ~200,000 to ~19,000 and ~16,000 vertices and using a kernel size of 2mm to yield 940 and 350 control points for the cranium and mandible respectively 24 , captured the interesting anatomical features at high density while avoiding noise, e.g. trivial surface texture differences. Different sizes of specimen will have different optimal spatial parameters. The high density of control points was further refined by having them clustered algorithmically at regions of high variability between samples. This might be contrasted with the inherent bias in landmarking that tends to place shape differences close to landmarks (observable in, for example, Fig. 3A, B). It can also be contrasted with the use of semi-landmarks, which adds landmarks that are evenly distributed across contours or surfaces that are bounded by identifiable landmarks 31,32 but still potentially leaves gaps where landmarks are sparse. A trade-off for all methods that increase the number of discrete observations, however, is the increasing as risk of overfitting as the number of variables (e.g. landmarks or voxels) increases relative to the statistical degrees of freedom, or the "curse of dimensionality" 33 . Controls for overfitting, as we have applied, are therefore, where possible, especially necessary.
Displacement heat maps reveal any kind of displacement, whether in-plane or perpendicular to it. In capturing the shape difference between a subject and control, the landmark-free method retains all the vertices of the mesh. This means that local "stretch" or "compression" Toussaint et al 13 can be mapped as vertex density and are captured in our stretch maps. In effect, each vertex is a landmark in a highly granular landscape. The very short spatial scale of this mapping is likely to be a much better way to capture and localise changes in biologically causal processes, such as cell proliferation or extracellular matrix expansion, compared to net displacement, where the cause could be hundreds of cell diameters away.
The stretch data has both refined the changes found, e.g. in the case of the cranial base showing how the auditory bulla are contracting, and provided new details in the case of the contraction of the inter-sphenoidal synchondrosis and the zygomaticotemporal suture. This type of new information is vital in accurately dissecting the causal developmental changes leading to altered anatomy. We have shown for the first time that not only are the zygomatictemporal sutures displaced laterally (Fig. 3B, C), relative to the rest of the skull, but have a localised contraction suggesting a failure to grow in early development either pre-or postnatally (Fig. 5B, C). It has been suggested that this type of insufficiency at this site could lead to facial dysmorphologies 34, 35 .
The landmark-free method has previously demonstrated its strength in the analysis of objects with a paucity of landmarks, such as MRI scans of brains where it was first described 24 . We have applied it to a much more landmark-rich type of specimen with two important conclusions.
Our first conclusion is that using a landmark-free approach is still advantageous even when traditional landmarking is possible. The second conclusion is that for those working with mouse models in particular, this is a readily useable tool. We believe this pipeline provides a portable tool that can be set up with modest computational expertise that will enable researchers to tackle questions of mutant phenotypes that traditional methods have struggled with, such as in early developmental stages or other biological forms that lack well defined landmarks. However, 4 subjects were excluded from the analysis due to fractures in either the mandible or skull. Heads were prepared for micro-computed tomography (µCT) by fixation in PFA and then scanned at a 25 µm resolution using a µCT 50 (Scanco).

Landmark Acquisition
Three-dimensional locations of 68 anatomical landmarks for the cranium and 17 landmarks for the mandible (Supplementary Fig. 1) as previously defined by Hallgrimsson and colleagues 36 were placed onto 3D reconstructions of µCT images using Microview (Parallax Innovations). Landmarks were placed manually and verified by checking orthogonal planar views of the subject.

Statistical analysis of landmark dataset
Landmarks for all subjects were aligned using Procrustes superimposition, and distances between landmarks for each subject were analysed by Principal Component Analysis (PCA) using MorphoJ 37 to visualise group separation by shape. To quantify significance in shape differences between genotypes we used the Procrustes Distance Multiple Permutations test

Landmark-free morphometric analysis
As an alternative to landmark population comparisons, statistical analysis of anatomical shapes can be achieved using so-called atlas-based approaches, which consist of estimating an anatomical model (i.e. template) as the mean of a set of input shapes (rather than point clouds) and quantifying its variation in a test population as deformations. This was previously achieved in a reproducible and robust landmark-free manner by Durrleman et al 24 . This approach bypasses a number of problems associated with mesh point-to-point comparison by representing deformation between shapes as the diffeomorphic transformation of flow fields, i.e. currents over the mesh surface. Currents are parameterised by a set of control points in space and initial velocities, or momenta. By means of a gradient descent optimisation scheme, the method is able to produce a statistical atlas of the population of shapes. An atlas refers to Toussaint et al 22 a mean template shape, a set of final control point positions, and momenta parameterising the displacements between the template to each initial individual shape. In the following sections we describe the different steps to achieve such analysis.

Landmark-free Atlas Construction
Step 1 (Fig. 1, Table 1, Supplementary Appendix 1). Despite the fact that images acquired using µCT show good bone contrast, they often include the presence of artificial objects (noise and debris in the specimen), small holes and cartilage that need be excluded in order to obtain consistently comparable final surface meshes. To extract the surface meshes, a series of image processing steps were applied. After a thresholding operation to extract the skull, "morphological opening" and "closing" were performed on the binary mask to remove internal cartilage structures ( Supplementary Fig. 2). Removal of spurious objects was achieved by clustering, categorising all of the connected components in an image by size and retaining only the largest component. Skulls were segmented using bone density to isolate the mandible (whose density is higher than that in the rest of the skull). However, this segmentation of the mandible can happen improperly and may include parts of the temporal bone which must be cleaned and removed manually. Mandible meshes were extracted from the skull binary mask semi-automatically using Watershed segmentation 41 .
Step 2. The meshes were produced using marching cubes on the binary images, followed by a surface Laplacian smoothing 42 and finally decimation of a factor of 1/80 to decrease mesh resolution and reduce overall computation time.
Step 3. The atlas construction necessitates the production of aligned meshes from the input µCT images as a pre-processing step. Minimally 3 pairs of landmarks were chosen to align meshes using a Procrustes technique 43 using a rigid body-plus-scaling transformation model (similarity alignment) or a rigid body transformation without scaling (rigid body alignment).
Step 4. As previously described 24 , the atlas construction major hyper-parameters consist of (1) the size of the Gaussian kernel used to represent shapes in the varifold of currents, denoted σW, and (2) the number of control points, denoted Ncp. Control points can be thought of as unbiased landmarks, initially they are spaced on a regular grid evenly but move to areas of greatest variation. Thus as much of the occurring shape change is captured in an unbiased manner. σW can be seen as the precision at which the shape deformation is described. Ncp denotes the sampling density in space. In our experiments, we fixed σW = 2 mm. Using that scale, the derived number of control points Ncp was 350 and 940 for the mandibles and the cranium analyses respectively. All subjects' meshes were used for the construction of the atlas. The optimisation procedure typically converged within 150 iterations in approximately 20h on a 10 core cpu for the mandible. The atlas construction produces the following outputs: • Control Points: For each subject, the Ncp final control points locations {c}, that correspond to areas that describe the population's most important variability in shape.
• Momenta: For each subject, the Ncp final momenta {m} associated to each control point, that describe the shape directional variation from the population average.
• Template: Mean shape of the population.

Landmark-free dataset statistical analysis
Step 5 (Fig. 1). The atlas outputs provide a dense amount of information that can be used for various statistical analyses. Centroid size was calculated as the square root of the sum of the squared distances from each mesh node to the centroid of all nodes of a given specimen.
Non-linear Kernel PCA with dimension 5 was applied to the set of momenta produced from the atlas of the population, in order to find the principal modes of variation of the entire population. The resulting output provided a way to compare these results with the landmarkbased PCA analysis. We projected the subjects onto the feature space for comparison purposes. Such projection provides dense information of shape differences between the two sub-populations. Local magnitude of the momenta interpolated at the template mean mandible (or cranium) mesh point locations allow for additional qualitative interpretation of shape Toussaint et al 24 differences between groups. Stratified k-fold cross-validation analysis was performed on the PCA data to evaluate the statistical power of classification between the two groups.
Significance of the classification score was tested using a multiple permutations test at 1000 iterations 44 . To assess overfitting, subjects were randomly partitioned into two groups ("scrambled groups") and PCA analysis performed to generate inter-group vectors. The distribution of the vector magnitudes was tested for Normality using the Shapiro-Wilk test.

Code availability
Python scripts and documentation for the landmark-free morphometric analysis are freely available on GitHub (https://gitlab.com/ntoussaint/landmark-free-morphometry ) and their use is also described in Supplementary Appendix 1. Step 1, extraction of region of interest. Initial thresholding of µCT image was used to make a binary mask and regions of the mask were separated by bone density using secondary thresholding, with some manual clean-up based on known anatomy. A region of interest (in this example the mandible in red)

Figure Legends
was chosen for further analysis.
Step 2, mesh generated and decimated by a factor of 0.0125 to reduce data file size.
Step 3, meshes of all subjects aligned either using rigid body alignment with no scaling or similarity alignment with scaling.
Step 5, statistical analysis and visualisation of shape data.     Step 2.

Alignment points
Reference points used to align meshes MITK 48 Mesh Alignment