Abstract
Variations in neuronal morphology among cell classes, brain regions, and animal species are thought to underlie known heterogeneities in neuronal function. Thus, accurate quantitative descriptions and classification of large sets of neurons is essential for functional characterization. However, unbiased computational methods to classify groups of neurons are currently scarce. We introduce a novel, robust, and unbiased method to study neuronal morphologies. We develop mathematical descriptors that quantitatively characterize structural differences among neuronal cell types and thus classify them. Each descriptor that is assigned to a neuron is a function of a distance from the soma with values in real numbers or more general metric spaces. Standard clustering methods enhanced with detection and metric learning algorithms are then used to objectively cluster and classify neurons. Our results illustrate a practical and effective approach to the classification of diverse neuronal cell types, with the potential for discovery of putative subclasses of neurons.
1. Introduction
Neuronal morphology dictates how information is processed within neurons [1], as well as how neurons communicate within networks [2]. Thus, given the large diversity in dendritic morphology within and across cell classes, quantifying variations in morphology becomes fundamental to elucidate neuronal function. The two major classes of neurons in the neocortex are principal cells (pyramidal cells), and GABAergic interneurons. Pyramidal cells play a critical role in circuit structure and function, and are the most abundant type in the cerebral cortex (70-80% of the total neuronal population) [3]. The morphology of pyramidal cells can vary substantially among cortical areas within a species [4, 5, 6, 7], and across species [8, 9]. Similarly, neocortical GABAergic interneurons are important in shaping cortical circuits, accounting for 10-30% of all cortical neurons [10, 11]. Classification of GABAergic interneurons has proved to be especially challenging due to their diverse morphological, electrophysiological, and molecular properties [12, 13]. Importantly, morphological differences among classes and subclasses of pyramidal cells and interneurons are presumed to be functionally relevant. Moreover, changes in neuronal morphology is thought to underlie various neurodevelopmental [14], and acquired [15, 16, 17, 18, 19] disorders. Thus, given the key role of pyramidal cells and interneurons in cortical function in health and disease, it is important to differentiate among their subclasses through rigorous classification tools.
Standard approaches rely on measurements of morphological features typically acquired from digital neuron reconstructions. Feature measurements are subsequently used to quantitatively assess and cluster cell classes [20], using standard supervised [13], and unsupervised [21, 22, 23] clustering algorithms. The raw quantification of features, provided by standard methods often fails to discriminate among neuronal classes that are visually very different (§2). Additionally, these methods have only been tested on select datasets. Therefore, there is a demand for more robust and general methods for discriminating among diverse neuronal cell types and larger datasets. In recent years, the field of computational topology has become increasingly more popular in the characterization of tree structures, including neurons. For example, [24] developed a new algorithm called ‘Topological morphological descriptor’, TMD which is based on topological data analysis (i.e. persistence diagrams) to classify families of neurons. In a more recent study by [25], the authors use TMD to classify cortical pyramidal cells in rat somatosensory cortex. The topological classification was largely in agreement with previously published expert assigned cell types. Furthermore, [26] present a framework based on persistence homology to compare and classify groups of neurons. Nevertheless, the available methods fail to fully capture the subtle morphological differences among families of neurons as we illustrate in the present work.
Here we take a novel approach to the classification of neurons. We view a morphological feature describing a neuron as a function which takes values in the real numbers, or more generally in some relevant metric space, and varies as a function of distance from the soma. Each morphological feature, such as tortuosity, taper rate, branching pattern, etc., gives rise to what we refer to as a Sholl descriptor. This Sholl descriptor is a rule that assigns a metric element, such as a number or a persistence diagram, to every given neuron and at any given distance from its soma. Value assignments are normalized so that all neurons are represented on a comparable scale, and are isometry invariant and stable §4.10. The construction we just outlined endows the set of neurons with a descriptor metric for every Sholl descriptor. Our approach is useful in that every morphological feature turns the set of neurons into a metric space. The closer the neurons are in the underlying descriptor metric, the more of this feature they share. This method gives a powerful, objective, and interpretable tool to compare and analyze neuronal morphologies.
In the course of our work, we developed eight Sholl descriptors, representing eight features which we then use in both unsupervised and supervised settings to cluster and classify families of neurons. Using diverse datasets, we identify key descriptors that reveal differences and similarities between neuronal classes. Our discrimination results were significantly better in separating different neuronal cell types than clustering methods based on raw quantification of features. Certain descriptors result in complete separation of selected groups of neurons. Thus, our highly effective and powerful classification tool could be used for the identification of new neuronal cell types, ultimately enhancing our understanding of the morphological diversity and function of neurons in the brain.
2. Results
We developed eight descriptors based on the following morphological or morphometric features: branching pattern, tortuosity, taper rate, wiring, flux, leaf index, energy, and the parameterized TMD descriptor. A schematic of each descriptor is illustrated in steps a-h in Fig. 1, and all terms are defined in §4.3. We illustrated the discriminative accuracy of our Sholl descriptors on six different datasets, providing evidence that relevant Sholl descriptors can reliably discriminate among different classes of neurons in agreement with previously published assignment. All datasets were downloaded from Neuromorpho.org [27]. They were chosen to cover diverse types and subtypes of neurons across different regions and animal species. For each dataset, we predefined classes and subsequently performed clustering analysis in several different ways.
Reconstructions of neuronal cell types are encoded in Sholl descriptors which are then used for clustering and classification. A representation of each Sholl descriptor is depicted in steps a-h. The process of clustering different neuronal cell types is shown in steps 1-6. Once neurons are vectorized based on descriptor metrics, we apply metric learning techniques to obtain classification (steps i-iii).
Detection, clustering and classification methods
In this paper we present a toolkit of descriptors, and describe their method of implementation. Sholl descriptors are run one at a time on each dataset, detection rates are computed, and dendrograms based on cluster analysis are generated. Descriptors can then be combined to optimize clustering and obtain classification.
To streamline terminology, a class C is detected at a 100% level by a descriptor ϕ means that the entire class fits in a ball under the descriptor metric, and no other neuron from any other class is within that ball. Detection reveals how much a given feature is able to single out the class C from all other classes. Detection rates are correlated with the corresponding dendrogram obtained via cluster analysis, so a high detection rate leads to the neurons in the class clustering together in the accompanying dendrogram. For convenience, we say a class has been detected by a descriptor ϕ if the rate of detection for that class is at least 90% (details in §4.7).
Neuronal clustering can be achieved using a single descriptor or a combination of descriptors. We present three different ways of combining the descriptors, each with its own merits. The first combination method is unsupervised in that it does not depend on the way we subdivide our dataset into classes §4.13. This combination method is used to analyze dataset 1. The second method combines features through a grid-search algorithm that produces a linear combination of descriptor metrics capable of differentiating among classes §4.14. If no such combination can be produced, the classes are indistinguishable under our morphological descriptors. This method is used to analyze dataset 5. The third combination method is a supervised technique which is achieved by means of metric learning and yields neuronal classification when applied to a large dataset. This last method is used to analyze dataset 6. This method works by first selecting relevant features based on their detection rates. Neurons are then vectorized using the descriptor metrics. A metric learning algorithm is then applied and produces an optimal metric that differentiates the classes. This method is subsequently checked for ‘overfitting’ §4.16. The new metric is subsequently used to tell how close in features a random neuron is to the given classes.
Feature selection and clustering: L-measure versus Sholl descriptors
As a proof of concept, we implemented our Sholl descriptors on dataset 1 which comprised three different neuronal cell types in the mouse brain: retinal ganglion cells (n=10), cerebellar purkinje cells (n=9), and interneurons in the medial prefrontal cortex (n=10). The assumption was to choose strikingly different neuronal cell types that could easily be clustered. A representative neuron from each type is shown in Fig. 2a-c. We applied seven descriptors to this set (all but taper-rate), assigned Sholl functions to neurons and computed distances for each descriptor. As shown in the detection Table 1, the performance of all descriptors was optimal in that each class was completely detected by at least two descriptors. Remarkably, the branching pattern descriptor detected all three classes, suggesting that this descriptor is sufficient in classifying this particular dataset. The parameterized TMD descriptor (TMD Sholl) performed equally well as other descriptors, but better than its classical version. The dendrogram based on the TMD Sholl is shown in Fig. 2f. The horizontal axis of the dendrogram represents all the neurons in this dataset while the vertical axis represents the distance between clusters. Interestingly, interneurons were fully detected by every single descriptor, suggesting that this cell type is characterized by a unique set of morphological features.
Representative neuronal recon-structions of (a) purkinje, (b) interneuron, and (c) retinal ganglion cell, in magenta, blue, and green colors respectively. Representative dendrograms for morphological parameters extracted from (d) L-measure software, (e) Sholl descriptors. Dendrograms based on descriptors for (f) TMD for dataset 1, (g) Tortuosity for dataset 2, (h) Branching pattern for dataset 3, and (i) Flux for dataset 4. For the reconstructions (a,b,c), the red dot represents the soma and the green dot is the barycenter of all nodes.
Numbers represent percentages. Green represents complete detection of a class (100%), while pink represents detection rates between 80%-100%. The taper rate was only run on dataset 2 since measurements of dendritic width were available for this dataset
We compare the performance of our Sholl descriptor methods to conventional clustering techniques to determine whether these methods can clearly separate classes of neurons in this dataset. First, we used morphological parameters from our eight descriptors to represent neurons as vectors (see §4.13), and applied a hierarchical cluster analysis algorithm. Next, we extracted 10 morphological parameters for this dataset using the L-Measure software [28] and applied the same cluster analysis algorithm. To make a fair comparison, we chose specific morphological parameters from L-Measure to match features that are captured by our descriptors (i.e, number of branches, leaves, bifurcations, path distance, and Euclidean distance). Fig. 2 shows a dendrogram of the linkage distances between the 29 neurons based on L-measure extracted features (Fig. 2d) and Sholl descriptors (Fig. 2e). Neurons are color coded according to morphological type. Cluster analysis based on features captured by the descriptors results in clear separation of classes into three clusters (Fig. 2e), whereas the L-Measure method returns two clusters with significant intermingling of neurons from the three types (Fig. 2d). Therefore, our results demonstrate that our combined descriptors can outperform conventional methods in clustering different neuronal cell types.
Distinguish among classes based on a single feature
Dataset 2 included 67 neurons from five different regions of the mouse brain: retinal ganglion cell (n=10), basal ganglia medium spiny (n=15), somatosensory stellate (n=9), hippocampal pyramidal (n=11), somatosensory Martinotti (n=22). This is the only dataset that included dendritic width in the reconstructions, which allowed us to implement the taper rate descriptor. The dendrogram in Fig. 2g shows the cluster analysis based on the tortuosity descriptor function. This Sholl descriptor resulted in efficient separation of neuronal cells types, particularly for the Martinotti (detection rate=91%), and hippocampal pyramidal cells (detection rate=92%). Table 1 reports the detection rates (see §4.7) for all descriptor functions for dataset 2. Detection rates highlighted in pink are above 80% while rates of 100% are highlighted in green. For comparison, the leaf index descriptor performed poorly in separating four of the five neuronal cell types, as evidenced by the low detection rates. This suggests that this particular morphological feature is largely uniform across these cell types (Martinotti, medium spiny, pyramidal, and stellate).
Dataset 3 comprises pyramidal cells in layer 3 of different cortical areas of the vervet monkey brain: primary visual cortex (V1) (n=10), V2 (n=10), and V4 (n=10). These reconstructions consisted of only basal dendrites. Prior reports have revealed regional differences in pyramidal cell morphology in the monkey brain [4, 8]. Specifically, pyramidal cell size, dendritic complexity, and spine density increase from primary visual cortex (V1) to higher order visual areas. Therefore, as a proof of concept we sought to recapitulate these findings by running our descriptors on reconstructions of pyramidal neurons from different areas in the visual cortical hierarchy. We expected at least to cluster pyramidal neurons from V1, V2, and V4, based on the branching pattern descriptor. Indeed, the dendrogram in Fig. 2h based on this descriptor reveals excellent separation of V1 neurons with some intermingling among V2 and V4 neurons. The wiring descriptor performed equally well in clustering, with excellent separation of V1 neurons and V4 neurons, and reasonable separation of V2 neurons (Table 1).
Subclustering within a neuronal class
To ensure sufficient coverage of neurons from different species, we also tested our Sholl descriptors in clustering pyramidal cells from different cortical areas in the rat brain. Therefore, dataset 4 consisted of rat pyramidal cells in layer 5 of somatosensory (n=20), secondary motor cortex (n=15), and medial prefrontal cortex (n=19). The discriminative accuracy in separating the three neuron groups with many of the descriptor functions was very high. For example, the cluster analysis based on the flux descriptor shown in the dendrogram in Fig. 2i resulted in nearly perfect clustering. The detection rate was highest in both medial prefrontal and somatosensory cortex (100% detection), followed by secondary motor cortex (94% detection). Likewise, the branching pattern, wiring, and TMD Sholl descriptors performed equally well as shown in Table 1. Remarkably, the combined descriptor approach yielded complete separation with three distinct clusters (Supplementary Fig. 12a) (see §4.14). Interestingly, the majority of pyramidal cells in secondary motor cortex formed their own distinct cluster, while several of these cells were clustered with other pyramidal cells in primary somatosensory cortex. This suggests the existence of two subpopulations of pyramidal cells in secondary motor cortex. Indeed, when we visually examined these neurons, we found striking similarities in morphology with pyramidal cells in primary somatosensory cortex. Therefore, the discriminative performance of certain descriptor is sufficient in separating neurons in different cortical areas of the rat brain, but more importantly, is powerful enough in revealing sub-clustering in a population of neurons.
Morphological aberrations
Revealing morphological aberrations resulting from neurodevelopmental and acquired disorders is an important step in understanding the pathophysiology of these diseases. Thus, unbiased methods to distinguish and separate normal neuronal morphology from aberrant morphology becomes essential. Therefore, dataset 5 included pyramidal neurons in layer 5 of rat somatosensory cortex in control (n=20) and experimental condition (n=16). This study assessed morphological changes of cortical pyramidal neurons in hepatic encephalopathy. Interestingly, the authors report that although dendritic arbors remained unchanged in rats with hepatic encephalopathy, dendritic spine density was significantly reduced [29]. Indeed, as one would expect, the detection rates based on all the descriptors was low (Table 1), suggesting that neurons from the control group and the experimental group were virtually indistinguishable. Unsurprisingly, the combined descriptor approach (Supplementary Fig. 12b) (see §4.14) resulted in intermingling of neurons from the control and experimental group. These results confirm previous findings that neuronal morphology is largely unaltered in rat cortical neurons with hepatic encephalopathy. More importantly, although the study only assessed path length and number of terminal ends in control versus experimental condition, we reveal using multiple Sholl descriptors that additional parameters related to neuronal morphology are in fact comparable between the two neuron groups. Nevertheless, although our descriptors do not reveal any structural differences between neurons in the two groups, there may be other features that differ which our descriptors do not capture. In a future study, we intend to construct additional Sholl descriptors that may potentially reveal structural differences between control and experimental neurons in this dataset.
Classification and metric learning
Finally we apply our descriptors to a relatively large dataset of 405 neurons in order to classify them. Dataset 6 is compromised of: Retinal ganglion cell (n=83), basal ganglia medium spiny (n=157), hippocampal granule (n=76), hippocampal pyramidal (n=49), and mPFC pyramidal (n=40). A classification scheme is then used which aims to (i) generate a single metric that can differentiate the classes, and (ii) assign a new neuron to one of the given classes that it shares the most features with. This was accomplished by first implementing all Sholl descriptors on this dataset to determine which ones resulted in the best detection rates (‘feature selection’). We subsequently chose these features (i.e. descriptors) for inclusion in our classification scheme. Specifically, Table 1 shows that the energy and total wiring descriptors were ineffective in distinguishing among classes in this dataset, as evidenced by low detection rates (below 80%). Therefore, these descriptors were removed and only the descriptors that performed well in separating these classes were included (flux, leaf index, branching pattern, tortuosity and TMD). Next, we vectorize the neurons based on these descriptors, resulting in classes of vectors in Euclidean space (in ℝ25). Principal Component Analysis (PCA) is subsequently used to reduce dimensions (Fig. 3a). Neurons from different classes in this dataset were largely overlapping and poorly separated. Data are then fitted and transformed into a new metric space using the Large Margin Nearest Neighbor (LMNN) metric learning algorithm which learns a Mahalanobis distance metric in the K-Nearest Neighbor (KNN) classification setting [30] (see §4.15). This powerful new approach results in excellent separation of classes in this dataset. Fig. 3b shows the data plotted in the new transformed space. In Fig.3c, we introduce a hippocampal granule neuron (so we have a priori knowledge of its class) into this dataset. The KNN classifier successfully predicted the class of the newly introduced cell. Note that the distances in the new metric space from the newly introduced cell to all 5 classes are depicted in Fig. 3c. Finally, Fig. 3d shows the result of an over-fitting test where we permuted the vectors among our five classes and plotted the data using PCA in the newly transformed space. The plot shows fairly poor separation which is an indication that our selected features were geometrically meaningful in classifying this dataset (see §4.16).
6. Each dot represents a neuron which is color coded according to its class. PCA was performed to reduce dimensions in (a-d). (a) Vectorized neurons in Euclidean space. (b) KNN classification in the new metric space. (c) Distances from newly introduced neuron to all 5 classes. (d) Overfitting test by permuting the classes.
3. Discussion
In this work we introduce a novel method of comparing descriptor functions of tree structures for classification of neuronal cell types. Importantly, we obtained substantially better clustering results when we compared the performance of our descriptors with conventional methods. By constructing a metric space valued function for a single neuron (Sholl function) to capture the evolution of a particular morphological feature as distance from the soma, we are able to compare Sholl functions for all neurons in a dataset. We illustrate that certain descriptor functions can effectively cluster classes of neurons with subtle morphological variations, as well as discriminate among widely different classes of neurons in agreement with expert assignment. Additionally, we leverage metric learning techniques to provide more robust classification. Our framework is powerful enough to separate diverse classes of neurons across different brain regions and species. Our results reveal several key findings regarding this tool kit of descriptors.
The six representative datasets used in this study were chosen to ensure morphological diversity and are thus derived from different areas, layers, and species. In dataset 2 (different types of neurons taken from different regions of the mouse brain), we show that the TMD and tortuosity descriptors performed very well in clustering this dataset. Specifically, based on the tortuosity descriptor the Martinotti and pyramidal cells each formed their own cluster. Interestingly, dendritic tortuosity has been shown to vary among different non-pyramidal neuron classes in the rat brain, whereby Martinotti cells in layer II/II and V of the frontal cortex have higher tortuosity than other cell types [31]. In the mouse brain, dendritic tortuosity increases as a function of increasing branch order on apical dendrites of hippocampal CA1 pyramidal cells [32]. Additionally, dendritic tortuosity of layer II/III pyramidal cells appears to increase from caudal to rostral regions in mouse cortex [33]. Our tortuosity descriptor is therefore powerful in distinguishing among neuron groups with non-uniform dendritic tortuosity. The upper limit for tortuosity values appears to be 2, which is consistent with prior reports [31]. Importantly, we improved discrimination accuracy by using a combination of descriptors which effectively assigns weights to the function with the best separation results.
Anatomical studies have shown that interneuron morphology is highly diverse in the cerebral cortex. For example, interneurons with similar somatodendritic morphology may differ in axonal arborization patterns [13]. Therefore, axonal morphometric features are typically required for accurate classification of interneurons as they have been shown to capture important differences among interneuron subtypes [34]. We did not analyze axonal features in our descriptors as that could explain why Martinotti, and medium spiny neurons were largely intermingled (dataset 2). However, based on dendritic features alone some of our descriptors (tortuosity and TMD) were able to reliably distinguish interneuron subtype (Martinotti) from other neuronal cells types such as purkinje and retinal ganglion cells (dataset 2). In a future study, we will focus our efforts on interneuron subtypes in order to incorporate important axonal features into our descriptors for a more accurate classification scheme.
Prior work has revealed regional differences in pyramidal cell morphology in the monkey brain. Specifically, in the Old World macaque monkey, pyramidal cells become progressively larger and more branched with rostral progression through V1, the secondary visual area (V2), the fourth visual area (V4), and inferotemporal cortex (IT) [4, 8, 36]. Therefore, we were interested in testing whether our descriptors can detect differences in the morphology of pyramidal cells from different visual cortical areas of the vervet monkey (dataset 3), another species of Old World monkeys. Indeed, we find that the performance of the branching pattern descriptor results in excellent clustering of cells from V1, V2, and V4. This suggests distinct differences in the branching pattern of basal dendrites of pyramidal cells residing in these areas. The wiring descriptor which is a proxy for total dendritic length yielded reasonable clustering of neurons, with some intermingling of neurons from all areas. This is not surprising given that in some species such as the tree shrew, differences in pyramidal cell morphology throughout the visual cortical hierarchy is less pronounced [37]. Even in rodents, regional differences in pyramidal cell morphology appear to be less noticeable than in primates [38, 39]. Therefore, the fact that pyramidal cells in V2 are intermingled with cells in V4 in our cluster analysis based on the wiring descriptor reflects genuine similarities between these two population of cells, and further suggests inter-neuron variations within each visual cortical area.
Collectively, the results from this study highlight the robustness of our framework in quantitatively characterizing and discriminating among different neuronal cell types. Certain morphological features and thus specific descriptors are better suited in separating distinct neuronal cell types. For instance, the branching pattern descriptor appears to perform very well in detecting most neuronal cell types. This descriptor measures how far or how fast nodes appear (bifurcations) and disappear (leaves), as measured from the soma. Conversely, the energy descriptor, which reveals the distribution of nodes around the soma, appears to reliably detect retinal ganglion cells, purkinje, and interneurons. Importantly, our use of metric learning techniques resulted in more optimal classification. Progress in the development of unbiased clustering methods to distinguish among groups of neurons will further our understanding of the relationship between brain structure and function. The toolkit of morphological descriptors introduced here, and the development of new methods will potentially lead to the discovery of novel sub-classes of neurons [40]. Additionally, our descriptors will aid efforts to uncover differences between normal and aberrant neuron morphology which is commonly associated with various disease states. For instance, changes in dendritic morphology have previously been described in a number of disease states, including Alzheimer’s disease [15], schizophrenia [41], and mental retardation [42]. Given that our tool kit of descriptors discriminated among different types of cells as well as revealing subclasses of cells, its utility may be extended to the study of brain diseases potentially identifying which subtypes may be affected in various disease states.
4. Methods
In this study we developed an unsupervised clustering and supervised classification framework based on relevant morphological features to differentiate among and classify different neuronal cell types. Given a topological or spatial feature of neurons, denoted by the greek letter ϕ, such as branching pattern, tortuosity, total wiring, etc, we associate to such a feature a “Sholl descriptor”. Specifically, a Sholl descriptor is a map from the set of neurons to a metric space of functions, with reasonable properties (Definition 4.3). We then use this assignment to construct a metric dϕ on the set of all neurons §4. In other words, we endow the set of neurons with a metric space structure for every topological feature ϕ. This (Sholl) metric measures how far neurons are in that metric for that particular feature, so that the closer the neurons are under that metric, the more of that feature they share. The fundamental idea of a Sholl descriptor is presented in Fig. 4. In this case, a branching pattern function is presented for a simple tree. The value of the descriptor for that neuron is the step function on the right.
(see §4.9.1). (a) A representation of a tree. (b) the corresponding branching function as a function of distance from soma.
Sholl descriptors form a toolkit to analyze neuronal morphology. Descriptors combined with standard hierarchical clustering methods, a detection algorithm (also used for feature selection), grid search, and metric learning functions are used to cluster and classify a dataset of neurons.
Analysis based on a single descriptor:
(Clustering) Given a dataset of unlabeled neurons we can run a particular descriptor in order to cluster them according to that descriptor. For example, if ϕ = T is tortuosity, we can set the distance matrix for the associated Sholl metric dT and then run standard hierarchical clustering to obtain dendograms. The obtained dendrogram reveals whether neuronal cell types differ according to their tortuosity (i.e. cluster together), or if their tortuosity is comparable (cells from different neuron types will be intermingled).
(Detection) This method assesses the performance of a given descriptor in identifying a desired feature within a dataset. For example, we can run a given descriptor ϕ on a dataset of neurons to detect which set of neurons under which label is “grouped together” under the descriptor ϕ. So if ϕ = T is branching pattern, we can represent three types of neurons in a dataset with colors red, green, and lavender (Supplementary Fig.11). This allows us to determine if dϕ detects the red neurons within a certain percentage, that is the detection rate of red neuron within a ball in that metric. Detection rates can be used as feature selection when running classification schemes.
Typically, when a given dataset of neurons which is distributed among classes C1,…, Ck (and thus labeled), we can run the detection algorithm one Sholl descriptor at a time to allow us to differentiate at least one class, but it leaves the other classes undifferentiated. In other words, using the metric dϕ associated to the descriptor ϕ, one or more classes are singled out (i.e. neurons will cluster) while the remaining classes may be indistinguishable. However, the other classes may be differentiated by other descriptors. Therefore, by combining descriptors, we can obtain a complementary combined effect that can better separate the classes.
Analysis based on a combination of descriptors (details in §4.12):
(Vectorization and unsupervized clustering) A given neuron can be converted into a vector using our descriptor functions and metrics. There are different approaches that may be used, however we implement§4.13. Using the feature vectors, we then run a standard hierarchical clustering algorithm.
(Classification) Given a dataset of neurons distributed among a number of classes, and a number of morphological features, we can determine which class a newly introduced neuron is associated with (i.e. shares the most features with). More precisely, suppose we are given classes of neurons Ci, 1 ≤ i ≤ n, and morphological descriptors ϕ1,…,ϕk which can be measured for all neurons. Therefore, given a newly introduced neuron N, we can determine which class, with measurable likelihood, this neuron belongs to. Using metric learning [30] we can determine which features are comparable among the classes.
(Differentiation) By linearly combining descriptor metrics, we can separate classes and obtain a new clustering metric. This approach is similar to metric learning in that it will separate different classes, unless the classes are truly indistinguishable.
4.1. Notation and Terminology
Capital letter N represents a neuron seen as a tree in 3-space.
A class C of neurons is a set containing a selection of neurons of a particular type.
A node in a neuron N is either the soma, bifurcation point, or a termination point
A branchpoint can be used interchangeably with bifurcation point. A leaf can be used interchangeably with a termination point.
The number of terminal nodes in a tree is denoted as degree which is a proxy for tree complexity. The number of branches of a tree is twice the degree of that tree minus one, while the number of bifurcations is degree minus 1.
Radial distance means Euclidean distance as measured from a point to the soma.
Path distance is the distance along dendrites.
Two nodes are parent-child related if they are adjacent on a branch. The node closer to the soma in path distance is called the parent, and the node farther away from the soma is called a child.
A branch is part of the dendrite between the father and any of its children (at most two).
R(N) is the “span” of the neuron, that is the largest radial distance of any of the nodes (typically a leaf).
A neuron has span R(N) if it can fit in a ball of radius R(N) and in no smaller ball.
L(N) is the length of the longest dendrite stemming from soma and ending at a termination point.
A neuronal feature is denoted by the Greek letter ϕ. It is always topological or morphological in nature, and its associated (Sholl) descriptor will also be denoted by ϕ.
A feature ϕ gives rise to a metric on the set of neurons which is denoted by dϕ.
4.2. Representation of neurons
We model a neuron N as a collection of rooted binary trees embedded in 3 space ℝ3, all having a common root (the soma). These rooted treed are also called the “primary” trees. In this paper, all 3D neuronal reconstructions are acquired from the public repository NeuroMorpho.org [27]. The morphological structure of individual neurons is retrieved from an SWC file which contains a digital representation of the neuron as a tree structure that consists of points/markers. Each marker has associated properties such as 1) its 3D spatial coordinates, 2) its radius denoting the thickness of the branch segment at a specific 3D location 3) a node type indicating whether it is soma, axon or dendrite, and 4) one parent marker to which it directly connects through neuronal arbors. Well-defined geometric constructions on neurons need to be invariant under the affine isometry group of ℝ3, and since the soma is always at the origin of any reference frame, it is sufficient to only consider invariance by rotations and reflections fixing the soma, and disregard translational invariance. Geometric constructions that only depend on conformal measures, like angle and distance, can result in interesting geometric and topological invariants for neurons.
4.3. Sholl descriptors
Below we detail the construction and definition of Sholl descriptors.
A “Sholl descriptor” is any rule that associates to a given neuron (seen as a tree embedded in ℝ3) a compactly supported function whose independent variable is the distance from the soma, either path or radial, and whose values are in a metric space X. We further require this function to be both isometry invariant and stable.
More precisely, for a given neuron N ⊂ ℝ3, a Sholl descriptor ϕ associates a function
which is supported on either [0, R(N)] or [0, L(N)] (for definitions, see §4.1). Here X is a metric space, and ϕ is stable in the sense of 4.10. Isometry invariance means that if N’ is obtained by rotating or reflecting N about respectively a line or a plane passing through the soma, then ϕN and ϕN’ are identical functions. As we will explain below, our constructions will be independent of scale, so we can consider that all of our neurons are normalized to be in a ball of radius 1 in space (see §4.4). So succinctly, a Sholl descriptor associates to every neuron N a Sholl function [0, 1] → X, with X a metric space, satisfying the stability and isometry invariance properties.
The following are Sholl descriptors that we discuss in this paper (details in §4.3).
Branching pattern: This is an integer valued descriptor given by the number of bifurcations from which we subtract the number of leaves, all within a given radius r from the soma. As the radius changes, this number changes. We get a different Sholl descriptor if we consider the same quantity (number of bifurcations - number of leaves) but within path distance r from the origin.
Tortuosity: Tortuosity between two nodes is measured as the quotient of the path distance by the Euclidean distance. The tortuosity descriptor measures the mean tortuosity between all adjacent nodes within radius r from the soma.
Flux: This associates to a given radius r the sum of the angles between dendrites and normal directions to the sphere of radius r centered at the soma, at the points where the dendrites intersect with that sphere.The flux construction is related to, and can be viewed as an extension of the root-angle construction in [43]. This construction considers, at any given leaf, the angle between the radial normal and the main branch ending at that point.
Taper Rate: This is based on the width of dendritic segments at bifurcations, taken as a function of path distance to the soma. Dendritic tapering is a measure of the change in width along a dendritic segment from node to node.
Leaf Index: This construction counts the number of terminations emanating from every node, plotted as a function of radial distance. If the node is taken to be the soma, this number is the total number of leaves, if the node is taken to be a leaf, the value is one (that leaf).
Energy. By viewing the nodes as charged electrons, they generate a vector field. The resulting combined field vector at the soma (superposition principle) is now measured. This vector changes as the number of nodes is increased, giving us a Sholl descriptor. This descriptor detects the position of the soma with respect to the nodes. If we divide space in octants, with the soma in the origin, then the more nodes present in the same octant, the greater the energy.
Total Wiring. This construction measures the total wiring of a neuron within a given sphere of radius r centered at the origin. This is the sum of the path distances of all dendritic segments within that sphere.
TMD. This is the Topological Morphological Descriptor of [24] redesigned to be a Sholl function. The metric space target in this case is the space of persistent diagrams with the Wasserstein metric.
4.4. Normalization
Given a Sholl descriptor , we can normalize it so that all function are supported on [0,1]. Let
be the descriptor function for a neuron N. If
is supported on [0, R(N)], where R(N) is the span of the neuron, that is the radial distance to the furthest point in N from the soma (see §4.1), define the corresponding normalized descriptor ϕN by :
If is a function of path length, and thus supported on [0, L(N)], then we normalize in the same way, with L(N) replacing R(N). All our descriptors are normalized and supported on [0,1]. The constructions we provide are such that all real-valued Sholl functions we consider are step functions. Fig. 5 illustrates these step functions for a very simple granule cell in mouse olfactory bulb.
This neuron was chosen because of its simple branching pattern which consists of 3 bifurcation points and 7 leaves. Notice that all neurons have been normalized to be included in a ball of a radius 1 centered at the soma. The step function for the “taper rate” descriptor is missing for lack of data. Also, the “TMD” descriptor is missing here, as it is not real-valued and cannot be easily presented.
4.5. Functional Metrics
Each Sholl descriptor defines a metric on the set of neurons. Let be a given set of neurons. Let ϕ be a normalized Sholl descriptor which associates to each neuron N a descriptor function
, and X a metric space. We assume that no two neurons can be identical with respect to any feature we define. We thus have an inclusion (i.e. an injective map)
Let d be any metric on the space of functions Map(I, X). It induces a metric on by setting
The distance we choose to work with is the “L1 distance”
All ϕN constructed in this paper are real-valued step functions, except for the Sholl-TMD in §4.9.8. We give the formula for the distance in this case. Let ϕ1, ϕ2 be two step functions with jumps at radii r1,…, rq and s1,…, sℓ respectively. This means that ϕ1 is constant on [ri, ri+1[, and similarly ϕ2 is constant on [sj, sj+1[. Let
and order the ti’s by increasing magnitude so we can assume
Then the L1-distance between the step functions is given by
Other choices of metrics we can work with for real valued Sholl functions are the Lp metrics for p > 1 or the Sup metric d(ϕN1, ϕN2) := Supr∈[0,1](ϕN1 (r); ϕN2(r)). This last metric is known to induce the compact-open topology on the space of functions when X a compact regular metric space. Once we can measure functional distances between descriptor functions, we can measure “Sholl distances” between neurons as indicated in (1). These distances are then used to cluster and classify neurons.
4.6. Clustering
The ultimate goal is to find measurable and quantifiable morphological differences between classes of neurons. When given a Sholl descriptor, and a selection of neurons N1,…,Nq, the standard procedure is to generate a distance matrix associated to the descriptor
This is a symmetric matrix with positive entries, and zeros along the diagonal. Any such distance matrix produces a dendrogram using standard hierarchical clustering algorithms. It is not reasonable to expect a single descriptor to cluster faithfully a given set of classes of neurons (Supplementary Fig. 8). The advantage of developing multiple descriptors based on various morphological features reveals which features are uniform and which are different among classes of neurons. Combining descriptors to differentiate between classes of neurons is another method we use. This combination can be achieved at the level of distance matrices, or at the level of Sholl descriptors since these form a vector space of functions (in fact an algebra). Indeed, given two normalized Sholl descriptor functions ϕ1, ϕ2 : I → ℝ, we can take linear combinations. This sum is also stable, as defined in §4.10, if we start with stable descriptors.
4.7. Detection and Feature Selection
Let C1,…,Ck be k distinct classes of neurons. Each class consists of neurons to be compared with neurons from other classes. We say that a descriptor ϕ has at least an n% level of detection of a class Ci if there is some ϵ ball in the dϕ metric so that more than n% of all elements of Ci are within
, and of all elements in
, more than n% are from Ci.
Example 4.2
Suppose we have three classes of neurons C1,C2,C3 each consisting of 5 neurons. Let ϕ be a given Sholl descriptor, and suppose there is an ϵ ball in the dϕ-metric, that contains 4 elements of C1 and 2 elements from C2 ⋃ C3. This ball contains of the total of all C1-neurons (i.e. 80%), while
or 66% of all neurons in this ball are C1-neurons. We say that the descriptor ϕ has detected C1 to a 66% level at least, which is the lower percentage from among 80% and 66%. If for smaller ϵ, we still have as many neurons from C1 in the smaller ball but we lose one neuron from C2, then detection is now at 80%.
The detection algorithm is described in the supplementary material. We also use detection as a method for feature selection when we run several descriptors on a given set of classes. The set of descriptors with detection rates that are less than a certain percentage are deemed ineffective in differentiating among these classes and can thus be excluded from further analysis (see §4.15).
4.8. Combination of Descriptors and Classification
A single descriptor may detect features within a family of neurons, but it alone may not be able to differentiate between many classes at once. The idea of “combining” several descriptors together into one single descriptor offers a more effective tool in differentiating between classes (this is what we call classification). The Sholl descriptor metrics are perfectly well-suited to provide such a classification. We have devised three combination methods, each being applicable within its own specific context. This was discussed at the beginning of §2 and the details can be found in the supplementary material §4.12.
4.9. The Sholl Descriptors
Given a neuron N viewed as an embedded tree in space, we define
Here Br is the ball of radius r around the soma.
4.9.1. The Branching Pattern Descriptor
This morphological descriptor detects patterns that results from the distribution of branches and leaves relative to the soma (see Fig. 4). Dendrites emanate radially from the soma, branching in a binary way. Branches and leaves appear and disappear as we move away from the soma, and a measure of this birth and death of branches and leaves gives rise to a function of the radius we call the “branching pattern” function.
Let N be a neuron which we view as a collection of single rooted binary trees in ℝ3, with the common root being the Soma. Label B1,…, Bq the branch points of N and label all leaves by L1,…, Lk. Let r > 0 be the radial distance measured away from the Soma. Order the branch points and leaves by increasing r, so that if ri indicates the distance of the i-th node to the soma, we have 0 < r1 < r2 < ⋯ < rq (equal radii can be removed by an infinitesimal perturbation).
Fixing a neuron N as before, associate to each r ∈ I the number α(r) defined by
Let R(N) be the span of the neuron N and define the function
The normalized version takes the form
This defines our branching-pattern descriptor ϕ. Note that
since the number of primary branches is the difference between the number of leaves and the number of bifurcation points.
Example 4.3
In Fig. 6, Tree structures of two different neurons are chosen (A) pyramidal and (B) stellate. The corresponding Sholl descriptor functions reveal obvious difference (C). The red curve which depicts the branching pattern of the pyramidal cell reveals that branching occurs rapidly close to the soma, but much slower as you move further away from the soma. Conversely, branching for the stellate cell is changing uniformly and steadily as you move away from the soma. Both neurons have similar branching counts: neuron (A) has 21 bifurcations and 30 leaves, while neuron (C) has 32 bifurcations and 49 leaves. This gives that ϕN1 (1) = −9 and ϕN2 (1) = −17 as depicted.
(a) Pyramidal and (b) stellate cell. (c) The step functions for each neuron generated from the branching pattern Sholl descriptor. The branching Sholl functions show that neuron (a) is branching quickly near the soma, and leaves appear much closer to the soma than for neuron (b). The number of primary branches for each neuron is the value of the corre-sponding function at 1. (d) Pyramidal cell in secondary motor cortex and (e) in mPFC. (f) The step functions for neurons (e) and (d) generated from the wiring Sholl descriptor. The sharp increase towards the end for the wiring Sholl function for cell d. reveals the existence of an apical tuft. (g) Interneuron and (h) Purkinje cell. (i) The step functions for each neuron generated from the energy Sholl descriptor. The two energy Sholl functions show completely distinct features. We obtain maximal energy values for purkinje cells (a defining feature). The total energy value at the soma for neuron (g) is 200000 units, while this value is 50 for neuron (h). (j) Martinotti and (k) Retinal ganglion cell. (i) The step functions for each neuron generated from the tortuosity Sholl descriptor. The red dot represents the soma and the green dot is the barycenter of all nodes
4.9.2. Tortuosity Descriptor
Represent a neuron by an embedded tree in space, and label its nodes by P1,…,Pn ∈ ℝ3. For any two nodes, we can consider both path distance and Euclidean distance between them. If Pi is a parent node and Pj is the child node, let bi,j be the dendritic path distance between these nodes, and let di,j be the length of the segment [Pi, Pj]. The ratio of both distances is
Consider a neuron N with n nodes §4.1. Let (parent, child) be a pair of adjacent nodes. There are exactly n such pairs coinciding with the number of branch segments. Define the average tortuosity of N to be the average sum
It is clear that 1 ≤ α(N) for all choices of N. The (non-normalized) Sholl descriptor function associated to this construction is now given as follows: order the the nodes of N by increasing radii as before. Then define
where B(r) is the ball of radius r around the soma. We then take the normalized version §4.4.
4.9.3. Taper Rate Descriptor
We start with a neuron N and list all path distances of the nodes to the soma in increasing order 0 < ℓ1 < ⋯ < ℓk. Each node has a dendritic thickness (or width) that tapers as we move away from the soma along the dendrite. We can measure the tapering rate as a function of path distance. More precisely, define
and then take the associated normalized Sholl descriptor by dividing ℓi by the length of the longest dendrite. This is a Sholl function whose variable is path length and not radial distance.
4.9.4. Flux Descriptor
We define the Sholl descriptor F and the associated flux functions for a given neuron N. Let
, where Br is a ball of radius r centered at the soma. Notice that
can be different from Nr (4) if there are dendrites that leave Br and then enter again. If a dendrite crosses the boundary sphere Sr at a point P ∈ N ⋂ Sr, we identify the parent of N (inside the sphere) and the child of N (outside the sphere). So the parent and child are on either sides of the sphere. The direction vector
from father to child points outward if A is inside, and points inward if A is outside the sphere. Consider the segment [A, B] and let C be the point on the segment that cuts the sphere. We then assign the value
This is the cosine of the angle between the unit vector along and the normal to the sphere going through C. This value is maximal if
is aligned with the radial vector at C and the angle is zero.
To define the total flux function, order the nodes of N as before by increasing values of their distances from the soma 0 < r1 < ⋯ < rk. For every , take the sphere of that radius r, look at all dendrites intersecting that sphere at P1,…, Pk and add up the values obtained from the construction outlined above. This value is
This gives rise again to a step function FN : [0,1] → ℝ by setting
Additionally, if A = (a1,a2,a3) is the parent marker and B = (b1,b2,b3) the child marker, such that either |OA| < r < |OB| or |OB| < r < |OA|, that is on different sides of the sphere, the point of intersection C = (c1, c2, c3) of that sphere with the segment [A, B] is obtained by setting and C = (1 – t)A + tB, then solving for t through a quadratic. The flux value at C is
4.9.5. The Leaf Index Descriptor
From each node grows a new dendritic tree with a number of terminal points. Counting the number of these terminal points for each node P gives the “leaf index” of P, and we write it as li(P). When P is a leaf, we set li(P) = 1. Figure 1b depicts the construction of this descriptor.
As before, given a neuron N, order its nodes P1,…,Pk by increasing order of distance to the soma 0 < r1 < ⋯ < rk, and define the Leaf Index Sholl Descriptor as follows:
Evidently LI(1) = 1 which is the value at the furthest leaf, while LI(0) is the total number of leaves. This is again a step function and distances between leaf index Sholl functions can be given by the standard formula (3). The next figure gives an example of a neuron and its associated leaf index Sholl function.
4.9.6. Total Wiring Descriptor
“Total wiring” is a morphological feature which measures the total dendritic length of neurons. When used as a Sholl descriptor, it gives total length of dendrites, but also their density as we move away from the soma.
Given a neuron N, let tℓ(N) be the total length of all dendrites of N. If Nr is part of the neuron within a sphere of radius r from the soma (4), then
This is a normalized Sholl function, which always starts at value 0 and ends up at value TLN (1) = tℓ(N) which is the total wiring of the neuron. As for other Sholl functions, we will only consider the step function version of this construction, where once more one defines for r ∈ [0,1],
where 0 < r1 < ⋯ < rn = 1 are the normalized radial distances of the nodes listed in increasing order.
4.9.7. Energy Descriptor (Nodal Distribution)
Given a neuron N, consider all its nodes as cloud points in 3D space viewed as charged particles. The charge each node carries will be proportional to the thickness of the branch at that point. These charged nodes affect the space around them through the electric field they generate. This electric field is a well-defined map
So taking the intensity of the vector field at each point of ℝ3 \ {nodes} gives us a measure of how space is being affected by the neuron. This also gives a measure of how the nodes are distributed in space as we will later illustrate in the case of Purkinje cells.
Let ζ := {P1,…, Pn} be the nodes of N, with Pi having charge qi. This charge qi is chosen to be the width of the dendrite at point pi. Each point Pi(xi, yi, zi) of ζ contributes an electric vector field which is normalized to have length qi and which is of the form
where
. By superposition, the node configuration N gives rise to a vector field EN(x, y, z) = ∑iqiFi(x, y, z) with square intensity
Let O(2) be the group of orthogonal matrices. This group acts on ℝ3, and thus on the set of neurons. If A ∈ O(2) and N ⊂ ℝ3 a neuron, we write A(N) the image of N under this action.
The intensity at the soma, |EN(0, 0, 0)|, is 0(2)-invariant.
Proof. We show first that |EA(N)(P)| = EN(A-1(P))| for P ∈ ℝ3. The nodes of N are {P1,…,Pn}. Write EA(N)(P) =EA(Pi)(P). Since EA(Pi)(P) = AEPi(A-1(P)), and since A is linear and preserves lengths, it follows that
At the soma P = (0, 0,0), A1(P) = P, so that |EA(N)(P)| = EN(P)|, which is what is claimed.
Our Sholl descriptor associates to every neuron the map
where Nr is as in (4). This map adds the unit vectors at the soma, one for each node, and takes the magnitude. We can also think of energy as the effect of the nodal distribution around the soma. If all nodes are on one side of a plane going through the soma, then their contributions is greatest (eg. Purkinje cells have very large energy values), as opposed to nodes that are evenly distributed around the soma. In this latter case, several cancellations occur and the energy value tends to be small.
4.9.8. The Topological Morphological Descriptor (TMD)
Let T ⊂ ℝ3 be a tree with a root R. A path is any continuous sequence of edges in T. Each path x have unique initial b(x) and terminal d(x) vertices. The TMD is based on a method of decomposing a given tree T into a collection of paths such that the sum of those paths is the whole tree T. In addition, any two paths from that decomposition will either have empty an intersection, or their intersection is the endpoint of one of them (and in this case a branching point of T).
The TMD path decomposition is obtained using the following procedure; All the paths from the TMD-path decomposition starts at the leaves of T. They continue along the tree, towards the root, until they reach a node n of degree 3 or higher in T. In the node n all the paths except from one terminate. The path that continues through the node n is the one with the initial node further away from R (soma)1. Once a path reaches the root R, it does not continue any further (it terminates). For example, a TMD-path decomposition is presented in Figure 7a.
(a) Example of TMD-path decomposition on a simple planar tree. The soma marked with 1 is the root. Equicentered circles reveal the distances of nodes from the root. The furthest node is node 8. The paths from the TMD-path decomposition are: {[5, 4, 2], [3, 2,1], [8, 6,1], [7, 6]} (b), The tree T with a single path x starting at the root R. When using TMD as a Sholl-type descriptor by considering TMD of T ⋂ B (R, r) we will only see the final barcode [0,d(R, 6)] for r ≥ d(R, 4). For the radii r between d(R, 6) and d(R, 4) the endpoint of the persistence interval will be equal r. When r reaches d(R, 4) the endpoint of the persistence interval it will then jump down to d(R, 6).
Representative isomorphic trees with entirely different (a) branching pattern and (b) tortuosity descriptors
Given a TMD-path decomposition as described below, we associate a collection of pairs of numbers inspired by persistent diagram, to this decomposition. For that purpose, a path x having the initial and terminal vertices b(x) and d(x) correspond to persistence interval [d(R, b(x)), d(R, d(x))]. Using the terminology from persistent homology, we say that the path x is born at the radius d(R, b(x)) and dies at the radius d(R, d(x)). The collection of all such birth-death pairs is then used as a signature of a tree. As it has the same structure as persistent diagram, we further adopt various metrics from persistent homology to compare such diagrams.
To fit the TMD into the scheme of the current paper we will now turn it into a Sholl descriptor having values in the space of persistence diagrams. For that purpose let Tr be the connected component of T ⋂ B(R, r) containing R. Let us make two simple observations:
Note that the TMD-path decomposition of T restricted to Tr is a valid TMD-path decomposition of Tr. To see that, let us consider a branching node n in T, such that d(R,n) ≤ r, and the same node in Tr. The paths from n to the leaves in Tr will either be the same as in T, or they will be cut short in Tr by B(R, r). In both cases, the path that does not terminate at n in T will also be the path with the initial point further away from R in Tr. It is possible that more than one path in Tr joining in n will be cut short by B(R,r). However in this case we can choose to continue in Tr the same path that continues in T. Consequently, a TMD-path decomposition in Tr can be obtained by appropriate restriction of the TMD-path decomposition in T.
Suppose we consider a path x in T giving rise to the persistence interval [d(R, b(x)), d(R, d(x))]. Then the path x will be present in Tr for r ≥ d(R, d(x)). However it may happen that the path x contains points that are further away from R than d(R,b(x)) and will be cut in those points by B(R,r). This will happen when the path x turns around as presented in the Figure 7b. In that instance, the interval [d(R,b(x)),d(R,d(x))] in Tr for certain values of r will have a larger value of the first coordinate than the corresponding interval in T. Therefore, while the first (birth) coordinate of the interval corresponding to x in Tr may be longer than the interval corresponding to x in T. The actual length can be obtained from the coordinates of degree-2 vertices in x.
Those two observations allows for quick computation of the Sholl version of TMD descriptor, i.e. a TMD of a tree Tr, also denoted as TMD(T,r). Firstly the TMD-path decomposition P of T is computed. Subsequently, for a given radius r, a subset P’ ⊂ P containing all the paths x such that d(x) ≤ r is selected. The paths in x ∈ P’ are transversed to find the point fx in there which is inside B(R,r) and furthest away from its center. Once found, the pair (d(R,fx),d(x)) is added to the TMD(T,r).
The algorithm described above uses the radial distance from the soma to construct Tr. When an intrinsic distance is used instead both in TMD and in construction of Tr, the Sholl version of the descriptor is even easier to obtain, as each path x ∈ P such that d(x) ≤ r will give rise to a pair (d(x),min(b(x),r)) in TMD(T,r).
Unlike other Sholl descriptors, TMD(T,r) has the range in the space of persistence diagrams which is a much richer mathematical structure than real numbers. Yet, it is still possible to compute distances between the functions TMD(T,r) and TMD(T’,r). Let us assume that both functions has been computed for a discrete set of values 0 = r0 < r1 < r2 < ⋯ < rn. Then a distance between TMD(T, r) and TMD(T’, r) can be approximated by:
where ddiag denotes any distance between persistence diagrams, e.g. p-Wasserstein distance.
4.10. Stability
In this subsection we define stability and then verify that all Sholl descriptors are stable. We address this issue by verifying that our descriptors are reasonably sensitive to small perturbations of input neurons. More precisely, if two reconstructions of the same neuron vary slightly, they will result in different tree representations. A descriptor is “stable” if, when applied to either tree, it gives results that also vary slightly (i.e. the variation is controlled).
Two different reconstructions of the same neuron produce two different trees embedded in ℝ3. All reasonable reconstruction schemes should produce isomorphic trees, resulting in the same number of primary dendrites and the same number of bifurcations. We can measure the distance between two reconstructed trees under the Hausdorff metric and use it as a measure of closeness. Two such reconstructions are expected to be close in the Hausdorff metric, requiring that our descriptor depends “continuously” on this metric. However, this is not a good notion, as trees that are very close in the Hausdorff metric may still have very different morphological properties (like lengths of branches, number of nodes, etc). Figure 9 gives an illustration of two trees (b) and (c) close, in a Hausdorff sense, to the initial tree (a) note the tree in (a) is represented in gray and overlapping with the tree in (b), and is depicted below the main branches in (c). Clearly the trees (b) and (c) have distinct morphological features compared to the tree (a) and they should not be considered similar.
Representative tree (a) and similar trees (b) and (c) that are close to tree (a) in the Hausdorff metric.
Our next definition is adapted from ([45], §2) who utilizes it for rectifiable curves and in the context of knot theory. We will assume that the dendrites are piecewise smooth paths in ℝ3; meaning that the branches between nodes can be parameterized as C1-differentiable paths in space.
We say that two neurons N and N’ are (δ, θ)-close if N’ can be obtained from N by a smooth 1-1 map Ψ supported on an open neighborhood of N so that corresponding points x and Ψ(x) are within δ and the norm differences ∥v – dΨx(v)∥ < θ for all x ∈ U and v ∈ Txℝ3, where dΨx is the differential of Ψ at x. We recall this is a linear map between tangent spaces dΨx : Txℝ3 → TΨ(x)ℝ3 mapping a vector v to Jacx(Ψ)(v), where Jacx(Ψ) is the 3 × 3 Jacobian matrix of partial derivatives evaluated at x. Let’s now make this construction a bit more precise. For the sake of simplicity we will assume Ψ is defined on all of ℝ3.
We say that N and N’ are (δ, θ)-close if there exists an ambiant diffeomorphism Ψ : ℝ3 → ℝ3 such that |Ψ(x) – x| < δ for all x, and the Frobenius norm ∥I – JacxΨ∥ < θ for every x ∈ ℝ3. In contrast with the definition in [45], we not only require the angles between corresponding vectors to be close, but also their norms. This is precisely the essence of the inequality ∥I – JacxΨ∥ < θ.
Let’s just observe that our definition is related to the C1-topology of functions in the following way. If we view a branch γ of N as a smooth path [0,1] → ℝ3, then it is (δ, θ)-close to Ψ o γ if both paths are C1-close. The definition of C1-closeness doesn’t involve the existence of Ψ, so its easier, but it cannot be defined globally on neurons, as opposed to just branches, since neuronal trees are not manifolds.
We now define “stability”.
A Sholl descriptor ϕ is stable if for any ϵ > 0, there exists η > 0 so that for δ < η, θ < η,
According to this definition, a small perturbation or deformation of the neuron which “moves the points by as little as δ” and “distorts the branches by as little as θ”, yields a small change in the descriptor ϕ.
Let N be a neuron represented as a spatial tree, and let ϕ be a Sholl descriptor. The nodes for N are sorted according to increasing distances from the soma
These distances are radial or dendritic depending on the descriptor. We make the assumption that a deformation of a neuron does not introduce new bifurcations, and so the leaf index is completely unchanged by deformation. It is evidently stable.
We start by verifying the stability of the branching pattern descriptor. This descriptor is only based on the distribution of nodes, and so only the δ constant matters. Let’s now move the nodes of N by a distance δ, and by that we mean there is a homeomorphism Ψ : N → N’, taking node Pi to so that for all
. We ensure that δ < mini |ri – ri+1| so that the formula for functional distance takes the form (3)
where
or
or (rj,rj−1) for some j. In the first case, |ϕN(ti) – ϕN/(ti)| = 1, and in the other two cases it is 0. This means that
where k is the number of nodes. By choosing δ small enough, this is less than any desired ∈.
The taper and energy descriptors depend only on the node distribution, and are thus stable. Note that for the taper rate, we assume that the width of a dendrite at a given node is the same in any reconstruction (this is not a varying feature in our definition of stability), so this descriptor only depends on nodes, and it is stable.
To see stability of the TMD with the radial distance to the soma let us observe that the TMD-path decomposition may change when the position of nodes is perturbed. This will happen when the endpoints of two paths that merge in a bifurcation point b are at almost the same distance from the soma. In this case a perturbation of the endpoints of those paths may result in swapping the branch that continues up from b with the one that terminates there. However, since the TMD only gathers the values of distances from the soma, the endpoints of persistence intervals will move by at most δ which directly translate in stability of the descriptor.
As for the Tortuosity descriptor, we recall that T associates to every N and r ∈ [0,1] the average tortuosity of Nr which is the connected component of N containing the soma inside the ball of radius r. The stability for T holds under one condition, and so we refer to it as “conditional stability”. We discuss this condition next and remark that it is almost always realized so that in practice and generically T behaves stably. To understand this condition, we must observe that instability can occur in the following situation illustrated in Figure 10.
Instability behavior
Method used to determine detection rate. Each circle is the boundary of a disk in the Euclidean metric.
Representative dendrograms based on the combination of metrics.(a) Combined dendrogram for dataset 4 and (b) dataset 5.
Here r is chosen to be the radial distance of the node A. The twisted dendrite touches the sphere tangentially at B. When measuring tortuosity of Nr, we only consider the term δSB (5) which is the tortuosity from the soma S to B considered as a leaf. If a small perturbation causes the branch to be inside the sphere, this term is replaced by the term δSC which is the tortuosity of the entire branch inside B(r) from S to C. This leads to a sudden increase in tortuosity which can potentially lead to instability. However this only happens if indeed a sphere through a node is tangent to a branch, which is a rare instance.
Let 7 : be a smooth space curve. Then its length is given by
. We say that the two paths γ2 and γ2 are (δ, θ) close if there is a (δ, θ)-diffeomorphism taking γ1 to γ2; that is γ2 := Ψ o γ1.
Let γ2 be a given smooth curve in ℝ3. For every ϵ > 0, with δ < η, θ < η, such that for every curve γ2 that is (δ, θ)-close to γ1,
Proof. Let γ2 be (δ, θ)-close to γ1, meaning there is a (δ, θ)-diffeomorphisme Ψ taking γ1 to γ2. Since γ1 is differentiable, is bounded uniformly on [0,1], say by M > 0. We can write
where we know by definition that ∥I – JactΨ∥ ≤ θ on the domain of Ψ. By choosing
, we obtain our claim.
Let Ψ be (δ, θ)-diffeomorphism taking N to N’. The map Ψ maps nodes to nodes, branches to branches necessarily. Lemma 4.8 shows that by controlling (δ, θ) we can control the lengths of branches. It is clear that T is stable away from the stated condition, meaning that if spheres through the nodes of N and N’ are not tangent to dendrites, then dT(N, N’) = dT(N, Ψ(N)) < ϵ for any chosen ϵ > 0, once δ, θ are chosen sufficiently small.
Finally we discuss the stability of the flux descriptor. The stability in these cases hinges on controlling the variation of angles in any neuron deformation. This is a direct consequence of the following lemma.
Let v1, v2 ∈ Txℝ3 be unit vectors, x ∈ N (x will be typically a node in our case). Then for any ϵ > 0, such that for any (δ, θ)-diffeomorphism Ψ with θ < η,
Proof. We fix x and drop it from the notation. We write the normalized vector. Using thecosine-angle formula
, we can write
By taking the difference, we see immediately that we can make the angles ∠(v1, v2) and ∠(dΨx(v1), dΨx(v2)) arbitrarily close by making their cosines arbitrarily close or equivalently by making |duΨ(v1) – duΨ(v2)| and |v1 – v2| arbitrarily close. We check this last part.
Since for any vector v ∈ Txℝ3, |dΨ(v) – v| < θ, we can extend the string of inequalities above to
By making θ small, the left term can be made arbitrarily small and this is enough to yield our claim.
4.11. The Detection Algorithm
The detection construction and motivation was discussed in §4.7. Supplementary Fig. 11 illustrates this construction on three classes labeled A, B and C.
In this section we provide a simple algorithm to estimate the level of detection of a Sholl descriptor ϕ on a set of neurons divided into classes C1,…, Cn. Recall that associated to ϕ we have a metric dϕ on the set of neurons (1). To compute the level of detection of C1, we proceed as follows:
Assume C1 = {N1, N2,…, Nm} has m neurons.
Pick a neuron Ni from class C1, one at a time, for i ∈ {1,…, m}.
Look at all distances dϕ from Ni to all neurons in all the classes. Divide the obtained distances into two sets: distances of Ni to all neurons in C1 (these are called the internal distances), and distances from Ni to all other neurons not in C1 (that is to C2 up to Cn, and these are called the external distances).
Order all internal distances increasingly d1 = dϕ(N1, N1) =0 < d2 < … < dm.
Now proceed iteratively: let ϵ = dm and consider all external distances smaller than dm. Count them, say there are cm of them. Then we have two ratios:
m/m = 1 ↔ 100% (meaning all neurons in C1 are within distance dm from N1 which is of course the case), and
(this gives the total percentage of neurons from the class C1 among all neurons that are a distance dm from N1).
In this case, the smaller of the two is
.
Consider next the ball of radius dm−1 centered at N1 and let cm–1 be the number of external neurons that are within that ball. We set βm−1 to be the smallest of
and
.
Proceed iteratively the same way through ϵ = dm–2,…, d2 and get rates βm–2,…, β2. One defines
Repeat the same procedure for N2,…, Nm and obtain the proportions β(Ni) for 1 ≤ i ≤ m. The level of detection of C1 by the descriptor ϕ is then simply
For instance, if we obtain detϕ(C1) ≥ 75%, this implies there is a ball in the dϕ metric that contains at least 75% of all the internal neurons in C1, and within that ball, at least 75% of all neurons are from C1. A detection rate of 100% means perfect detection whereby a ball contains all of C1 and no other neurons from the other classes C2,…, Cm.
4.12. Combination of Descriptors and Classification
Our objective is to build a toolbox of distances d1,…, dn that can be used to discriminate among trees (classes of neurons) according to a given morphological feature. One aim is to understand, for two or more classes of trees, which morphological features differentiate them. For example, C1 can be a class that represents neurons from an experimental group with a neurological disease, while C2 is a class of neurons from a control group. In this regard, one may want to know which morphological features are different between these two classes.
4.13. Vectorization: Unsupervized Classification
The starting point is a set C = {N1,…,Nk} of neurons. Given a descriptor ϕ and a neuron N, consider its area value . The extreme values of ϕN are meaningful. We will assume ϕ is one of the descriptors below:
Branching: Then ϕN(1) is the (opposite) of the number of primary branches.
Tortuosity: then ϕN (1) is the average tortuosity of N.
Leaf index: then ϕN (0) is the total number of leaves of N.
Energy (respectively flux and wiring): then ϕN(1) is respectively the total energy vector at the soma (respectively flux and total wiring of N).
We can list all of our descriptors ϕ1,…, ϕg, ϕ1 being leaf index, and associate to N the vector
If a descriptor data is not available (eg. taper rate), then the corresponding pair of entries is omitted from the vector. For Dataset 1, we vectorized based on seven descriptors (all but TMD-Sholl and taper rate).
4.14. Combination of Metrics
Here we present a greedy grid–based search procedure that can reveal which features are ‘fundamentally’ different between classes. For that purpose, we consider a new distance d being the following linear combination:
We will consider the constants αi sampled from a uniform grid. For the fixed choice of α1,…,αn let Ik = maxx,y∈Ckd(x, y) be the maximal distance d between objects in Ck for k ∈ {1, 2}. Let E = {minx∈C1,y∈C2 d(x,y)} be the minimal distance d between elements from two different classes. We then consider the ratio
and select α1,…,αn from our grid of points that maximize scα1,…αn. Note that this greedy grid search has exponential complexity with n, being the number of considered distances.
The obtained weights α1,…, αn give an idea of the relative importance of different distances in the separation of classes C1 and C2. consequently, when distances can be interpreted geometrically, the weights may help identify the geometrical features that are important in the separation of the classes, and those that are not.
This idea can be generalized for a multi–class problem. There are various ways this can be achieved. However, we will only present one approach. Let us have k classes C1,…, Ck and define Ij = maxx,y∈C. d(x,y) (maximal internal distance in Cj) and Ei,j = {minx∈Ci,y∈Cj d(x,y)} (minimal distance between classess Ci and Cj). Then the multi-class score is given by:
The remaining of the procedure described above is not changed. We can show that separation is meaningful by conducting a “permutation test” (see §4.16).
4.15. Metric Learning: Supervised Classification
Starting with classes of neurons C1,…, Ck, a Sholl feature ϕ, and a given random neuron N, we wish to know how close in feature is N to any of these classes, or in other words, how much of the feature ϕ does N share with any of these classes?
More generally, given classes C1,…, Ck, a neuron N and a number of features ϕ1,…, ϕm, we can use the combined effect of the features to “classify” the Ci’s. Our approach is to devise an optimal metric Dϕ1,
,ϕm such that N is close to the class Ci if its distance under this metric to this class is small (relative to other distances to other classes). This is accomplished via the relatively recent technique of Metric Learning (see [44]) and machine-implemented in [30]. We explain how this is achieved and successfully used to solve the following two problems:
Under the metric learned DML = Dϕ1,…,ϕm, classes that share the largest number of features (i.e. are within shorter Sholl distances) are closer than those that share fewer or none.
Given a neuron picked from outside our dataset
, the metric-learned DML determines with good confidence in which suitable class this neuron fits.
The starting point is a set of vectors vi ∈ ℝn, each of which is labeled by an integer ℓi. Ideally one would hope that the Euclidean metric does “separate” the classes, meaning that vectors in the same class are close and those in different classes remain relatively distant. This is hardly the case in practice, so one seeks a modification of this Euclidean metric which has this separation property. A standard approach is to introduce a “matrix of weights” M, which is n × n, positive, so that
defines a new metric on ℝn (so called Mahalanobis metric) with better separating properties with respect to the chosen classes. More precisely, maximizes the sum of distances between points with different labels while keeping the sum of distances between those with similar labels small. Note that since M can be written as LLT, the associated DML metric has the following interpretation: it is the distance obtained by first moving vectors via L in ℝn, then taking their Euclidean distance. Various machine learning algorithms have been implemented to find this optimal M. This approach is entirely supervised since we need the classes to train the matrix entries and thus the metric.
In the context of this paper, the given data are neurons distributed among chosen classes, say C1, C2,…, Ck. Typically, a class C will comprise neurons from a particular region in the brain, an experimental condition, or developmental stage. Therefore, all neurons in that class share a desired property. Let Ω = ϕ1,…, ϕm be a family of Sholl descriptors. For ϕ ∈ Ω, and , we can consider the average distance of N to Ci for the descriptor ϕj given by
As ϕ runs over Ω, we obtain the vector in ℝmk (here n = mk)
where the first k entries give the average distances of N to all k classes in the metric dϕ1, and the last k entries give the average distances of N to all k classes in the metric dϕm. Notice that each neuron
comes with a unique label i if N ∈ Ci. The vector in (7) depends on the ordering on the Ci’s and ϕj’s, but the final outcome will not. In all cases, and for the remaining constructions in this section, an order on classes C1,…, Ck and on descriptors ϕ1,…,ϕm is always chosen before we start running any algorithm, and this order is preserved throughout the process.
Starting with a dataset of neurons, and given m descriptors, we obtain a set of labeled vectors in ℝmk, one for each neuron. The class Ci of neurons will correspond to a class of vectors. The labels are in {1, 2,…, k}. This setup is precisely what metric learning (ML) requires for implementation, and data can be run through a supervised ML algorithm [30]. The end result is a metric DML, depending on dϕ1,…, dϕk, which differentiates between the classes
.
The solutions to the two problems (i) and (ii) raised at the beginning of this section are now evident. They are summarized below.
The Classification Scheme: Start with m classes C1,…, Ck and k descriptors ϕ,…, ϕm.
Run metric learning on the vectorized classes to obtain a new metric DML. The new metric is validated after being tested for “overfitting” (see §4.16). A good metric gives good separation of the vectorized classes.
Given a neuron N, vectorize it as in (7) and then take its average distances to the classes
as in
. These distances are compared.
The random neuron N shares these given features the most with a particular class Cj if
is smallest among
. This can be phrased in terms of percentages.
(feature selection) Run each descriptor on the classes. If the detection rates are lower than 80% on all classes, the descriptor can be considered “noisy” and is subsequently excluded. Repeat the process above with non noisy descriptors.
4.16. Overfitting
Both grid search §4.14 and metric learning §4.15 provide efficient tools to differentiate classes of neurons. Yet, the fact that these methods return clear separation between two or more classes of trees is not sufficient to conclude that the separation is geometrically meaningful.
Example 4.10
Let us consider four vertices of a square: A = (1, 1), B = (−1, 1), C = (−1, −1) and D = (1, −1). Suppose that the first class consists of points A and D, while the second class is composed of points B and C. Metric learning will seek to place elements of different classes far away and those of the same class close together. This can be achieved by a metric function by making A large and a small. Such a perturbed Euclidean metric can be obtained both by the grid search and metric learning. Yet, it is clear that the division to the first and second class is somewhat arbitrary; In fact, putting points A and B to the first class and C and D to the second one is geometrically similar. Alternatively, separation of these classes can be achieved by the same distance function with A small and a large, and therefore can also be found by the methods we present here.
In order to detect cases similar to the one described in Example 4.10, we will use a procedure that is similar to a permutation test. Namely, after obtaining separation of the given classes, we will repetitively permute all the labels of data points and run the grid search / metric learning for the data with the permuted labels. We will check how frequently a good separation between the permuted labels is obtained. If that happens often, then the separation between the initial classes is not valid. However, of it is not the case, then we have additional verification that the separation of the original classes is meaningful.
4.17. Software
MATLAB v2019a (The Mathworks Inc., Natick,MA. RRID: SCR_001622), Python v3.7 (Python Software Foundation. RRID:SCR_008394) and R v4.0.3 (R Foundation for Statistical Computing. RRID:SCR_010279) were used for computations. Code for the data analysis is available at https://github.com/reemkhalilneurolab/morphology
5. Code availability
Code for the data analysis was deposited to the Github repository and is available at https://github.com/reemkhalilneurolab/morphology
6. Funding
This work is supported by grants from the Biosciences and Bioengineering Research Institute (BBRI) and Faculty Research Grant (FRG), American University of Sharjah (AUS). P.D. acknowledges the support of Dioscuri program initiated by the Max Planck Society, jointly managed with the National Science Centre (Poland), and mutually funded by the Polish Ministry of Science and Higher Education and the German Federal Ministry of Education and Research.
8. Competing interests
The authors declare no competing financial interests.
9. Author contributions
S.K, R.K. and A.F. conceptualized the project; S.K, A.F., and P.D designed the computational analysis; P.D. and A.F. performed all the coding. S.K, R.K., P.D and A.F. interpreted the results; R.K. and S.K wrote the initial draft paper; S.K, R.K., P.D and A.F. revised the initial draft and wrote the final paper.
Footnotes
↵1 In case of two or more paths satisfying this condition, the one that continue is picked up randomly.