Pairwise Relative Distance (PRED) is an intuitive and robust metric for assessing vector similarity and class separability

Scientific studies often require assessment of similarity between ordered sets of values. Each set, containing one value for every dimension or class of data, can be conveniently represented as a vector. The commonly used metrics for vector similarity include angle-based metrics, such as cosine similarity or Pearson correlation, which compare the relative patterns of values, and distance-based metrics, such as the Euclidean distance, which compare the magnitudes of values. Here we evaluate a newly proposed metric, pairwise relative distance (PRED), which considers both relative patterns and magnitudes to provide a single measure of vector similarity. PRED essentially reveals whether the vectors are so similar that their values across the classes are separable. By comparing PRED to other common metrics in a variety of applications, we show that PRED provides a stable chance level irrespective of the number of classes, is invariant to global translation and scaling operations on data, has high dynamic range and low variability in handling noisy data, and can handle multi-dimensional data, as in the case of vectors containing temporal or population responses for each class. We also found that PRED can be adapted to function as a reliable metric of class separability even for datasets that lack the vector structure and simply contain multiple values for each class.


31
Vectors are ubiquitous data structures. As a result, the assessment of vector similarity is one 32 of the most frequently performed data operations in diverse areas of science and engineering. 33 To list examples within only biology, vector similarity has been used to show that reef fish 34 species in different ecoregions resemble each other in traits, not taxonomy or phylogeny 35 (McLean et al., 2021); that cancerous cell lines' gene expression patterns cluster according to 36 their tissue of origin and cancer stage (Ross et al., 2000); and that certain brain regions have 37 similar fMRI brain activation patterns over time, suggesting they are functionally connected 38 (Sasai et al., 2021). Common metrics for vector similarity include Pearson's correlation, cosine similarity, and 49 Euclidean distance. Distance-based metrics, like Euclidean distance or Manhattan distance, 50 compare the magnitude of difference between the values in the two vectors. On the other 51 hand, angle-based metrics, like the cosine similarity or the Pearson's correlation, compare the 52 relative pattern of values within a vector with that in another vector. To take a straightforward 53 example, consider the vectors [1 2 3] and [10 20 30]. A distance-based metric would call 54 them different, while an angle-based metric would call them very similar. On the other hand, 55 the vectors [1 2 3] and [3 2 1] would be described as relatively similar by the distance-based 56 metrics and dissimilar by the angle-based metrics. Both types of metrics provide useful and 57 complementary information; however, in practice, multiple metrics are rarely used together. 58 In many applications, instead of choosing between one of the two types of metrics, it would 59 be desirable to combine the similarity in the magnitudes and the similarity in the relative 60 patterns into a single, reliable indicator of vector similarity. 61 We recently devised a metric, called Pairwise Relative Distance (PRED), to quantify the level 62 of similarity in different individuals' neuronal responses to the same set of odors (Mittal et 63 al., 2020). PRED captured the similarities both in the absolute values and the across-odor 64 patterns of the responses and provided more intuitive values of similarity than correlation in 65 quantifying stereotypy in sensory responses (Mittal et al., 2020). These initial results led us to 66 ask whether PRED could serve as a general-purpose metric for analyzing vector similarity in 67 different types of datasets. 68 Here, we generalize PRED as a robust metric for assessing vector similarity and class 69 separability. Using simulations and experimental data, we show the advantages of PRED over 70 the commonly used metrics and demonstrate its reliability in analyzing noisy or incomplete 71 data. We illustrate PRED's ability to capture the similarity in temporal or population-level 72 data while preserving the dataset's structure. Although we illustrate the usefulness of PRED 73 using examples from the olfactory system, one can use PRED equally well in other sensory 74 modalities in neuroscience, non-neuroscience biological fields like the examples described 75 above, and non-biological fields like machine learning. Overall, our results present Pairwise 76 Relative Distance as a reliable metric of similarity or separability in neuroscience and 77 beyond. 78 79

80
PRED as a general metric for vector similarity 81 In this work, we generalize PRED to all datasets that can be expressed as a matrix, whose 82 columns are specific classes (dimensions) and rows are the vectors being compared; we will 83 refer to this organization as class-vector structure (Figure 1a). For example, consider the 84 responses of different retinal neurons to the same set of visual stimuli. In this case, each 85 visual stimulus can be considered a class (column) and each neuron (row) a vector of 86 responses to the different classes (i.e., the set of stimuli). For any such dataset, PRED 87 provides a unified measure of the similarity between the vectors and the separability of the 88 classes. Put simply, class-vector PRED measures whether vector A's value in a class is more 89 similar to vector B's value in the same class than to B's value in another class. PRED is high 90 when the distances are larger between values belonging to different vectors and different 91 classes than between values belonging to different vectors but the same class (Figure 1a). In 92 other words, a high value of PRED means that the two vectors have values not only with 93 similar magnitudes but also with similar patterns across the classes. A zero value of PRED 94 indicates that the two vectors have unrelated patterns across the classes. A negative value of 95 PRED indicates that the two vectors have opposite patterns across the classes. Unlike 96 correlation, PRED also accounts for the absolute differences between the values in the given 97 vectors. 98 We compared PRED and five other metrics on their ability to report the similarity across 99 vectors within a class-vector dataset. These five metrics included Pearson's correlation (PC), 100 Cosine similarity (COS), Manhattan distance (MAN), Euclidean distance (EUC), and 101 Chebyshev's distance (CHEB). PRED, PC, and COS values range between -1 and 1, where 1 102 denotes high similarity; MAN, EUC, and CHEB range from 0 to ∞, where 0 denotes high 103 similarity. To enable a direct comparison of the values of all these metrics, we transformed 104 the distance-based metrics (MAN, EUC, and CHEB) to a range between 0 and 1 using a 105 negative exponential (see Materials and Methods), such that 1 denotes high similarity for all 106 the metrics (Supplementary Figure 1a (i)). We use the transformed distance-based metrics 107 in all subsequent analyses unless otherwise stated. 108 For interpreting the values of a metric, it is helpful to know its chance level, i.e., the metric's 109 expected value for random data. For example, suppose a metric's observed value for a given 110 dataset is high relative to its chance level. In that case, one can reasonably infer that the 111 vectors in the dataset have a high similarity: the more the difference, the higher the similarity. 112 It is further desirable that the chance level remains unchanged with the size of the dataset (the 113 number of classes in the dataset) so that values obtained from different datasets, regardless of 114 their size, can be directly compared. To test each metric's chance level, we simulated two 115 different random datasets, one with 2 and the other with 5 classes. Each dataset included 10 116 vectors (with length equal to the number of classes) sampled from a uniform distribution 117 between 0 and 1, ensuring no inherent similarity between vectors and difference between 118 classes (see Materials and Methods for details). Expectedly, the observed chance level of 119 PRED, PC, and COS was nearly 0 for both the 2-class and 5-class datasets; it was greater 120 than 0 for MAN, EUC, and CHEB for both types of datasets (Figure 1b). Moreover, MAN, 121 EUC, and CHEB's chance levels were different for the datasets with different numbers of 122 classes (Figure 1b). This difference occurs because the distances between vectors depend on 123 the vectors' sizes; we can more directly observe this change in chance levels with 124 untransformed MAN, EUC, and CHEB metrics, all of which showed larger values with more 125 classes (Supplementary Figure 1b). We tried to normalize these metrics according to the 126 number of classes -for example, by dividing MAN by the number of classes or dividing 127 EUC values by the square root of the number of classes. Although these normalizations 128 reduced the overall differences between the chance levels for different numbers of classes, 129 the differences remained significant (Supplementary Figure 1c). Thus, distance-based 130 metrics do not provide a stable chance level. 131 Another important consideration for assessing a metric's utility is its ability to report the level 132 of similarity for a dataset, and its modifications, in a way that matches intuition. We had 133 previously reported PRED's advantages over PC in calculating stereotypy (Mittal et al.,134 2020). Here, we extend this analysis to include the other metrics. If the responses in a vector 135 are the same for both classes, PRED reports a value of 0; however, PC is undefined, and COS 136 reports a high value (Supplementary Figure 1a (ii)). If the two vectors exhibit opposite 137 patterns across the classes (Supplementary Figure 1a (iii)), PRED and PC appropriately 138 quantify the similarity as -1. COS, however, still reports a value close to 1, which does not 139 match the intuitive difference between the two vectors. The distance-based metrics also fail to 140 capture this difference: they report the same values of similarity in Supplementary Figure  141 1a (iii) and (iv), even though in one case the vectors exhibit opposite patterns and in the other 142 case they exhibit similar patterns across the two classes.  (Figures 1c-e). We found that PRED 166 exhibited the highest dynamic range and lowest variability among all the metrics (Figure 1f). 167 Even for simulated datasets with different base means, PRED was consistently more robust 168 than the other metrics (Supplementary Figures 1d, e). Thus, PRED remains informative 169 across a relatively large range of noise levels in the dataset and provides a relatively stable 170 estimate of similarity. 171 PRED for behavioral similarity assessment 172 We previously applied PRED to comparing the similarity of neural response patterns to an 173 odor set across individuals (Mittal et al., 2020). However, in principle, it can be applied to  174  any dataset where the data are arranged as vectors (each vector's length equals the number of  175  Our results above (Figure 1f) have indicated that PRED is more stable than PC for noisy 185 data. Therefore, we reasoned that it would also be more robust when working with 186 incomplete datasets. The 141-fly behavioral dataset provided a suitable test case for this idea. 187 We randomly selected 70 flies from the dataset and calculated the similarity of the preference 188 index vectors at the two time points using PRED and PC. This random sampling was repeated 189 20 times, each resulting in a different value of PRED and PC. Even with incomplete datasets, 190 both metrics reported significant similarity: 0.20 ± 0.04 (P = 8.9 × 10 −15 , n = 20; one sample 191 t-test compared to 0) for PRED; and 0.37 ± 0.10 (P = 6.8 × 10 −13 , n = 20) for PC. Note that 192 the PRED values were less variable (smaller s.d.) over the repeated samplings. Even the 193 coefficient of variation, defined as = .
. , over these 20 samplings was smaller for 194 PRED (0.21) than PC (0.27) (Figure 2b). Since these observed values of the COV may 195 depend on the specific 20 samplings that occurred, we repeated the whole process of 20 196 samplings a total of 50 times and each time calculated the COVs for both metrics. This 197 analysis confirmed that the COV was consistently lower for PRED (P = 2.8 × 10 −15 , n = 50, 198 two-sample paired t-test; Figure 2c). Thus, PRED provides a relatively stable estimate of 199 similarity for partial samplings of the dataset. the response is a single number, such as the total number of spikes (or the net firing rate) 206 evoked by a stimulus within a pre-defined time window. However, one may want to look at 207 the response in finer detail, for example, by considering the temporal pattern of spikes evoked 208 by the stimulus. We can represent the temporal pattern as a set of numbers by dividing the 209 time window into, say, 10 bins and then counting the spikes in each bin. Thus, the response to 210 a stimulus is now itself a 10-element vector rather than a single number (Figure 3a). In this 211 case, if we want to compare the responses to a set of stimuli in two individuals, we need to 212 compare two vectors of vectors rather than two vectors of numbers (Figure 3a). 213 Although correlation is frequently used to quantify the similarity between vectors, it is not 214 equipped to handle vectors of vectors. A common modification to use correlation in such 215 cases is concatenating the internal vectors within the outer vector to result in a single (and 216 long) vector. In the example discussed earlier, it would mean combining the two 10-element 217 vectors corresponding to the two stimuli to obtain a 20-element vector for each individual and 218 then calculating the correlation between the 20-element vectors of the two individuals 219 (Figure 3a). On the other hand, PRED is natively equipped to handle vectors of vectors and 220 does not require concatenation: it involves calculating Euclidean distances between the 221 values, which we can do irrespective of whether the values are single numbers or vectors. In 222 the example discussed above, we can calculate 1 and 2 for PRED based on the 10-223 dimensional Euclidean distances between the binned responses and then PRED using the 224 regular formula, 2 − 1 2 + 1 (Figure 3a). 225 We used both PRED and PC to compare the firing rates or the 10-bin temporal patterns 226 evoked by odors in different individuals (see Materials and Methods). We performed this 227 analysis in two different datasets: the olfactory response of mushroom body output neuron, 228 bLN1, in locusts (Gupta and Stopfer, 2014) and four different projection neurons in 229 Drosophila (Shimizu and Stopfer, 2017). We used a 2-second window after odor-onset to 230 calculate the responses; in these datasets, the responses typically returned to baseline within 2 231 seconds in response to the 1-s odor pulse. Therefore, we can consider any spikes observed 232 after this window as a part of the background spiking. For the temporal response, we divided 233 this response into ten bins, each of length 200-ms (Figure 3a). Both PRED and PC revealed 234 significant similarities between individuals and showed that the similarity was slightly lower 235 when considering the temporal patterns instead of only the firing rates (Figure 3b and 236 Supplementary Figures 2a-d). 237 Although PRED and PC behaved similarly in this analysis, PC can run into problems because 238 of the concatenation step. Concatenation removes the distinction between the values 239 belonging to different bins within the same class and the values belonging to different 240 classes. For example, after concatenation, analyzing the 10-element temporal responses to 2 241 stimuli becomes identical to analyzing the firing rate responses to 20 independent stimuli, 242 with each element contributing equally to the correlation. To illustrate why this can be 243 problematic, we consider the case when the temporal response includes bins beyond the 244 stimulus-evoked response; these bins would be mostly empty except for some noise. Since 245 empty bins are similar by nature, including such bins in the response vectors and effectively 246 treating them as independent stimuli after concatenation would spuriously increase the 247 observed correlation. 248 In contrast, the calculation of Euclidean distances in PRED would be minimally affected by 249 the empty bins: the distances would only become slightly noisier by the noise in the empty 250 bins. Thus, PRED would report slightly lower similarity, which is a more intuitive outcome 251 given the inclusion of irrelevant bins. To test these predictions in the actual datasets analyzed 252 here, we included extra bins after the initial 10 bins of 200 ms duration. For example, in an 253 11-bin response, the first 10 bins would contain the first 2-s response after odor onset, while 254 the last bin would contain an extra 200-ms response from 2 to 2.2-s after odor onset. Since 255 the stimulus-evoked response typically lasted for less than 2 s, the extra bins included after 256 the 2-s response are usually empty except for some noise. We found that, as predicted, the PC 257 values increased as we added more and more extra bins in the response, whereas the PRED 258 values decreased (Figure 3c and Supplementary Figures 2f-i). We further simulated a 259 dataset containing two odors and ten individuals. The first 10 bins contained a simulated 260 temporal response, and the subsequent bins contained random noise (see Materials and 261 Methods). There was a noticeable increase in the PC values in these simulations with an 262 increasing number of extra bins (Figure 3d). The effect became more pronounced when we 263 added empty bins (i.e., bins with a value of 0) instead of bins with normally distributed noise. 264 In this case, PRED values were constant as the empty bins did not affect the distances in 265 PRED calculations (Supplementary Figure 2e). These results illustrate the pitfalls in using 266 concatenated vectors in PC and suggest that PRED is a better alternative when working with 267 multi-dimensional data. 268 Another type of multi-dimensional data is population-level data, i.e., the response of, say, 6 269 neurons from the same neural layer from two individuals responding to two stimuli. To 270 analyze such a case, we can either calculate the similarity separately for each neuron and then 271 take the average or directly consider the 6-element population response vector for each 272 individual and odor. We used PRED to compare these two approaches, using a published 273 dataset of calcium imaging responses of 37 antennal lobe glomeruli responding to 36 pure 274 odors in 61 individuals (Badel et al., 2016). The similarity observed between individuals 275 using the population vectors was significantly more than the average similarity of neurons 276 considered separately (0.37 compared to 0.25 ± 0.10, P = 1.7 × 10 −10 , n = 37; one-sample t-277 test; Figure 3e). These results suggest that the combined cell population preserves more 278 similarity within the system than individual cells, echoing previous studies' results (Mittal et 279 al., 2020). The results also illustrate the usefulness of PRED in analyzing population-level 280 data. 281 282 Class separability 283 The datasets we have considered so far had a class-vector structure (as shown in Figure 1a us only about the similarity between the vectors but is a poor indicator of class separability, 288 as can be seen by comparing Supplementary Figure 1a (iii) and (vi)). In these datasets, 289 there is a correspondence between the ℎ value in class 1 and the ℎ value in class 2, as they 290 both belong to the same vector in row (which could be an individual, a time-point, or any 291 other variable depending on the experimental context). However, many datasets do not have 292 this correspondence (i.e., there are no row-vectors) -for example, in neuroscience, one 293 often measures the responses of a neuron or a brain region to different stimuli (classes) and 294 takes multiple measurements (called trials or samples) for each stimulus. In such cases we are 295 left with only classes (columns), with each class containing multiple values (as shown in 296 Figure 4a). This formatting is commonly used in datasets with repeat measurements over 297 multiple classes. Here, the numbers of samples for different classes do not have to be 298 identical. Each sample value within a class may be a single number (e.g., the firing rate of a 299 neuron or the preference index of an animal) or a set of numbers (e.g., a binned temporal 300 response or a population response). Assuming that the samples within a class are generated 301 under identical experimental conditions and that the samples in different classes are generated 302 independently, there is no logical correspondence between the ℎ sample in class 1 and the 303 ℎ sample in class 2. We will refer to such datasets as class-sample datasets. In such datasets, 304 one often wants to know about the separability of the classes. index (Rousseeuw, 1987) to evaluate the quality of the clustering obtained. Another study 313 used the Silhouette index to assess the efficiency of single nucleotide polymorphism 314 genotyping assays in dividing samples into 3 different groups: homozygous for the first 315 allele, homozygous for the second allele, or heterozygous (Lovmar et al., 2005). Apart from 316 the Silhouette index (Rousseeuw, 1987), an evaluation of a clustering technique's efficacy 317 can be made using other internal clustering validation indices like the Davies-Bouldin index 318 (Davies and Bouldin, 1979) or the Dunn's index (Dunn, 1974). Another method commonly 319 used to measure class separability is Euclidean template matching (ETM), which involves 320 classifying each value based on its Euclidean distance from class templates (constructed from 321 the remaining data) and then calculating the average accuracy from these classifications 322 (Stopfer et al., 2003). 323 Since the PRED value for a class-vector dataset depends on class separability, we asked 324 whether PRED can also be used as a measure of class separability in class-sample datasets 325 (Figure 4a). We compared PRED to five commonly used metrics: Silhouette index (SIL), 326 Davies-Bouldin index (DBI), Dunn's index (DUNN), ETM, and Calinski-Harabasz index 327 (CH) (see Materials and Methods for a description of each metric). As an initial test of 328 PRED's feasibility for this application, we used two different datasets containing repeated 329 responses to different odors. We obtained one dataset from the identified bLN1 neuron in 330 locusts (Gupta and Stopfer, 2014) and another from four identified projection neurons in 331 Drosophila (Shimizu and Stopfer, 2017). Each dataset contains the response from multiple 332 individuals; we compared the odor separability calculated using PRED and the other metrics 333 for each individual. We found that PRED values were somewhat correlated with the values 334 from other metrics in both the datasets (Figures 4b-f and Supplementary Figures 3a-e). 335 (Note that the correlation with DBI is negative because a lower DBI value indicates a higher 336 separability, whereas the opposite is true for PRED and the other four metrics). These 337 correlations with the established metrics suggested that PRED might also be useful as a 338 metric of class separability. To explore this further, we compared PRED's performance with 339 the other metrics in various situations. 340 As discussed in the analysis of class-vector datasets, a key feature of any metric is its chance 341 level. For evaluating the chance level of separability metrics in class-sample datasets, we 342 simulated datasets containing clusters (classes) of points with fixed radii on a 2-d plane and 343 different levels of noise ( Supplementary Figure 3f; see Materials and Methods for 344 details). As we increase the noise in the simulated dataset, the classes lose their separability 345 ( Supplementary Figures 3f-h). We used datasets with extremely high noise levels to 346 calculate the chance level of each of the six metrics. Further, we checked how the chance 347 levels depend on the number of classes in the dataset. PRED showed a chance level close to 348 0, regardless of the number of classes. CH showed a chance level greater than 0 that was not 349 different for 2-class or 5-class datasets (Figures 4g, l). However, the chance levels of the 350 other four metrics changed significantly with the number of classes (Figures 4h-k). 351 Imagine a large dataset containing many classes where any two classes have the same level of 352 separability, whose value is not known to us. Further, imagine that, for practical reasons, we 353 have access to only a subset of the dataset covering some of the classes, and our task is to use of samples per class. We found that CH, ETM, and DUNN values varied significantly with 369 the number of samples (Figure 5b), while PRED, SIL, and DBI were relatively stable. We 370 conclude that PRED provides an unbiased estimate of class separability regardless of the 371 number of classes or the number of samples per class. Therefore, we can reliably use it with 372 datasets of all sizes. 373 We next studied the stability of each metric against noisy data by checking the dynamic range 374 and the variability at the midpoint of the dynamic range. We simulated datasets with noise 375 levels ranging from zero (highly separable classes) to very high (poorly separable classes). As 376 before, we estimated the dynamic range as the range of noise levels for which a metric 377 remained unsaturated and variability as the percent standard deviation over repeated 378 simulations with noise at the mid-point of the dynamic range (Figures 6a-f). PRED and SIL 379 showed the best combination of large dynamic range and small variability (Figure 6g). 380 DUNN had the lowest dynamic range and high variability, while DBI exhibited a high 381 dynamic range but also the highest variability (Figure 6g). We used the Drosophila and 382 locust datasets to complement the simulation results. We added increasing amounts of noise 383 to each value in the datasets and then compared the metrics (Supplementary Figure 4; see 384 Materials and Methods). Again, PRED and SIL exhibited large dynamic ranges and small 385 variabilities in all cases. DUNN and DBI showed a high dynamic range in some cases but 386 were the worst performers in variability in most neurons. Overall, PRED and SIL appear to 387 be the most robust metrics in handling noisy datasets. Considering that SIL values (including 388 the chance level) depend on the number of classes, as discussed above, PRED appears to be 389 the best among the considered metrics for quantifying class separability (summarized in 390 Table 2). more across individuals than across trials within an individual. Consistent with this, they also 413 found that the odor responses of the projection neurons were also more variable across 414 individuals than across trials, suggesting that this response individuality may underlie the 415 behavioral individuality. The behavioral individuality depended on serotonin: it reduced 416 when the flies were fed alpha-methyl tryptophan, a serotonin synthesis blocker. However, 417 somewhat unexpectedly, they did not detect a reduction in the response individuality in the 418 presence of the serotonin blocker. Their analysis used principal component analysis and 419 Bayesian modeling to compute inter-fly and intra-fly distances. Since quantifying 420 individuality requires an assessment of inter-individual differences relative to intra-individual 421 differences, we reasoned that individuality could be aptly described by class separability, 422 where the individuals are classes, and the trials are samples within each class. We reanalyzed 423 their data using individual-trial (class-sample) PRED to quantify the individuality of the PN 424 responses to different odors (Figure 7c; see Materials and Methods). In the wild-type flies, 425 we observed that 50% (84 out of 168) of the PN-odor responses were significantly separable 426 across individuals (Figure 7d) (Figure 8a). To estimate stereotypy in the glomerular input vectors of groups at a 459 particular hierarchy level, we calculated the group-dataset (class-vector) PRED (Figure 8a). 460 At the level of 'connectivity types,' we found that the PRED value was 0.56±0.25 (P = 461 4.4 × 10 −193 , n = 496), notably higher than the chance level of 0, suggesting that the 462 averaged connectivity vectors were separable across connectivity types and similar across the 463 two datasets. Similarly, high PRED values were also seen at other grouping levels (cell type: LHNs groups across the two databases. 467 The above analysis compared the averaged glomerular connectivity patterns of different 468 groups. Next, we sought to assess whether the glomerular connectivity patterns of different 469 neurons within a group were more consistent than the patterns of neurons across different 470 groups at the same hierarchy level. This could be easily quantified as group-separability using 471 group-neuron (class-sample) PRED. In both the datasets, we found that the 'connectivity Experimental studies may be limited in the amount of data that they can collect; in terms of, 494 for example, how many different stimuli one can present, or how many individuals can study, 495 or how many trials one could perform, and so on. Also, experimental data is subject to noise 496 from multiple sources. Thus, it is desirable to analyze datasets with a metric that is robust to 497 noise. In our study, PRED exhibited the largest dynamic range and the lowest variability 498 among the metrics tested. It also worked well with incomplete datasets (Figures 1, 2, and  499 Supplementary Figure 1). 500 Many metrics are available for calculating the similarity of vectors when each value within 501 the vector is a scalar quantity (a number). However, we cannot directly use these metrics 502 when each value within the vector is itself a vector (a set of numbers), as is the case with 503 temporally patterned neural responses or population responses. One could forcibly convert 504 the vector of vectors into a long vector of numbers through concatenation. However, 505 concatenated vectors lose the distinction between classes and the elements of values within a 506 class. As we showed by simulating increasingly longer temporal patterns, this can lead to an 507 inaccurate estimation of similarity. On the other hand, PRED provides a more straightforward 508 and intuitive method for analyzing multi-dimensional data while preserving the inherent 509 relations between different dimensions (Figure 3 and Supplementary Figure 2). 510 We found that PRED also works well for analyzing class separability in class-sample 511 datasets, as the results with PRED were well correlated with those obtained from other 512 commonly used metrics. PRED provided a stable chance level and was unaffected by the 513 dataset's size, whereas most of the other metrics that we tested varied with an increase in the 514 number of classes or samples. We tested the robustness of several internal clustering 515 validation metrics to noisy datasets. In these analyses using simulated and experimental data, 516 PRED was consistently among the metrics with the highest dynamic range and the lowest 517 variability. Thus, PRED presents a consistent and more reliable alternative for evaluating 518 class separability in class-sample datasets (Figures 4 -8 and Supplementary Figures 3-519  5). 520 When dealing with large datasets, one consideration in choosing a metric is its computational 521 time complexity. Since PRED calculates the similarity iteratively for all combinations of 522 pairs of classes and pairs of vectors, its time complexity is of the order of �� 2 � × � 2 �� = 523 ( 2 2 ), where and are the numbers of classes and vectors, respectively. Thus, the 524 time required to compute PRED increases polynomially with an increase in the dataset's size. 525 Other class-vector metrics including Pearson's correlation, cosine similarity, and distance-526 based metrics have ( 2 ) time complexity. However, datasets in many applications are 527 small enough ( , ≤ 100) that the time complexity of PRED would not become a limiting 528 consideration. 529 We originally designed PRED for class-vector datasets, in which there is a correspondence 530 between the ℎ element in class 1 and the ℎ element in class 2, as both elements belong to 531 the same vector (row). PRED calculation makes use of this correspondence when making the 532 2x2 matrices for a pair of classes: if a 2x2 matrix has the ℎ and the ℎ values from class 1, 533 it must have the ℎ and the ℎ values from class 2). In class-sample datasets, this 534 correspondence across classes is absent, as there is no ordering among the class elements -all 535 samples are random replicates. This lack of order poses a dilemma while calculating PRED: 536 which pair of values in class 2 should we use for making the 2x2 matrix with a particular pair 537 of values in class 1? We overcome this dilemma by considering all possible pairs from class 2 538 iteratively for a given pair of values in class 1. This method ('exhaustive PRED') increases 539 the time complexity from �� 2 � × � 2 �� to �� 2 � × � 2 � 2 � for class-sample datasets, 540 assuming each of the classes has ( ) elements (Supplementary Figure 5a). In practice, 541 the extra time required for 'exhaustive PRED' would be noticeable only for large datasets 542 with hundreds of classes and samples. The calculation can be made faster using an 543 approximation ('fast PRED'). In 'fast PRED,' we assign an arbitrary order to the elements in 544 each class (e.g., the order in which the values were saved) and then create 2x2 matrices in the 545 same way as is done in class-vector datasets: when we take the ℎ and the ℎ values from 546 class 1, we also take the ℎ and the ℎ values from class 2. Using simulations (see Materials 547 and Methods), we found that the difference between the 'exhaustive PRED' and the 'fast 548 PRED' values was ~3% for datasets with more than 15 samples (Supplementary Figure 5b). 549 Changing the ordering of elements within classes did not have a noticeable effect on the 550 value of PRED. Thus, we can efficiently and reliably compute PRED for large class-sample 551 datasets. we have already compared with PRED, other metrics with similar approaches, like the t-557 statistic or Fisher discriminant, can potentially be used for analyzing class-sample datasets. 558 However, these metrics have their drawbacks. The calculation and the interpretation of the t-559 statistic depend on the degree of freedom, which is a function of the number of samples 560 observed. The discriminant analysis assumes a linear separation between the classes and thus 561 might not be ideal for neural datasets. Another approach, formulated by Huerta et al. (Huerta 562 et al., 2004), also quantifies intra-class and inter-class differences. They calculated average 563 within-class ( ) and across-class ( ) distances, similar to our 1 and 2 564 calculations. They then quantified the similarity across classes by measuring − 565 normalized by the maximum expected value of this difference. The normalization procedure 566 is highly dependent on the type of system under consideration, and it might not be possible to 567 calculate the denominator in many cases. PRED is self-normalizing and system agnostic, 568 providing a consistent estimate of class separability for any dataset. 569 So far, we have computed 1 and 2 as the Euclidean distances between within-class and 570 across-class values. In principle, one can use any distance measure in place of Euclidean 571 distances for calculating PRED. For example, one can use Mahalanobis distance to account 572 for different variabilities of the various dimensions of a response or Hamming distance to 573 compare datasets with binary or categorical values. For temporal data, instead of binning the 574 responses, one could use methods like the Victor-Purpura Purpura, 1997, 1996) 575 or the van Rossum (Rossum, 2001) distances to calculate the distance between spike trains. 576 This flexibility in the choice of the distance metric may help in the future in optimizing 577 PRED for different use cases. 578 579 581 We generalized the definition of PRED from our previous work (Mittal et al., 2020) to all 582 class-vector datasets. We considered all possible combinations of pairs of vectors and pairs of 583 classes to calculate the PRED value. For each 2 × 2 matrix thus obtained, we computed two 584 distances (Figure 1a): 1 = ( 1 − 1) 2 + ( 2 − 2) 2 is the sum of the squared Euclidean 585 distances between the values to the same classes in different vectors; 2 = ( 1 − 2) 2 + 586

Class-vector PRED
( 2 − 1) 2 is the sum of the squared distances between the values belonging to different 587 classes in different vectors. We used the ratio 2 − 1 2 + 1 to estimate the PRED value in each 2 × 2 588 matrix. To obtain the final PRED value for a particular dataset, we first averaged the values 589 over all class pairs before averaging over all vector pairs. Cases with missing data were 590 ignored for the calculation of the mean. Note that in the calculations described here, the 591 Euclidean distances can be easily calculated even if the values ( 1, 1, 2, 2) are not 592 numbers but are equal-sized vectors (see Figure 3a for an example). PRED ranges between 1 593 and -1, where 1 indicates that the vectors have identical values and patterns across classes, 0 594 indicates that the vectors have no similarity and have random patterns across the classes, and 595 -1 indicates that the vectors have exactly opposite patterns across the classes. 596 597 We used a slightly modified method of calculating PRED (labeled 'exhaustive PRED') for 598 class-sample datasets (Figure 4a) Chebyshev's distance (CHEB). If the dataset included more than two vectors, each of the 611 metrics was calculated over all possible pairs of vectors and then averaged. PC was computed 612

Class-sample PRED
using the corr function in MATLAB; while analyzing experimental datasets, any rows with 613 incomplete data were removed. COS was as 1-cosine distance using the cosine option of 614 the pdist function in MATLAB. The distance-based metrics MAN, EUC, and CHEB were 615 calculated using the pdist function with the options cityblock, euclidean, and chebychev, 616 respectively. Since the range of the distance-based metrics (MAN, EUC, and CHEB) was 617 between 0 and ∞, we transformed these metrics using the negative exponential function 618 ( ) = − which mapped the range to be between 1 and 0 such that a value close to 1 619 indicated a small distance (high similarity) between the vectors. Calinski-Harabasz index (CH). ETM is based on a simple algorithm for calculating 625 classification accuracy (Stopfer et al., 2003). Briefly, a template was created for each class by 626 averaging the values within the class, excluding the test sample. Next, for each sample in the 627 dataset, the Euclidean distances between the sample and all the templates were calculated. If 628 the smallest distance belongs to the template of the actual class of the sample, the sample was 629 correctly classified and scored as 1 (if templates of n classes, including the actual class of the 630 sample, had the same smallest distance, the score was set to 1 ). Otherwise, the score was set 631 to 0. The average of the scores from all the samples was reported as the final value of ETM. 632 ETM ranges between 0 and 1, where 1 denotes the highest level of class separability (every 633 sample is correctly classified). We used a custom function written in MATLAB for 634 calculating the ETM values. The Silhouette index compares the pairwise intra-class and inter-635 class distances (Rousseeuw, 1987). It ranges between 1 and -1, where 1 indicates high 636 separability. DBI is calculated as the ratio of within-class and between-class distances 637 (Davies and Bouldin, 1979). It ranges from 0 to ∞, where 0 indicates high separability. CH 638 measures the ratio of the average intra-class and inter-class variances (Caliński and Harabasz, 639 1974). It ranges between 0 and ∞, where a higher value indicates higher separability. SIL, 640 DBI, and CH were calculated using the evalclusters function in MATLAB, with the options 641 Silhouette, DaviesBouldin, and CalinskiHarabasz, respectively. DUNN calculates the ratio 642 of the minimum inter-cluster distance to the maximum intra-cluster distance (Dunn, 1974). The chance level for each metric was calculated using datasets with no inherent similarity or 657 separability. For the class-vector metrics, we simulated a dataset of 10 vectors and either 2 or 658 5 classes. Each value within the dataset was randomly drawn from a uniform distribution 659 between -1 and 1, ensuring no structure within the classes or the vectors. The whole 660 simulation was repeated 1000 times, and the vector similarity metrics were reported. For the 661 class-sample metrics, we simulated a 2-dimensional clustered dataset with 10 samples and 662 either 2 or 5 classes. The cluster radius was set to 0.05 for all the classes, and a big noise term 663 randomly drawn from [ (0, 50)] 1×2 was added to simulate inseparable clusters. The whole 664 simulation was repeated 1000 times, and the class separability metrics were reported. 665 666 Dynamic range and variability 667 The dynamic range was defined as the range of noise levels in which a metric remains 668 informative (i.e., does not saturate near the maximum or the minimum level). We simulated a 669 dataset with increasing levels of noise (on a log scale). We measured the average value 670 reported by the metric at the 5 lowest noise levels (as ( )) and at the 5 highest noise levels 671 (as ( ℎ )) simulated. The absolute difference between these two values, | ( ) − ( ℎ )|, 672 was called the vertical range of the metric. For a metric whose value decreased with 673 increasing noise, the left boundary of the dynamic range was taken as the lowest noise level 674 at which the average value of the metric was lower than the value at the lowest noise level by 675 at least 1% of the vertical range, i.e., = min( ) : The right boundary of the dynamic range was taken as the highest noise level at 677 which the average metric value was greater than the value at the highest noise level tested 678 plus 1% of the vertical range, i.e., ℎ = max( ) : ( ) > ( ℎ ) + 0.01 × | ( ) − ( ℎ )|. 679 The dynamic range was calculated as | ℎ − |. 680 The variability of the metric was defined as the standard deviation of the metric at the mid-681 point of the dynamic range divided by its vertical range, i.e., 682 where ( ) represents the standard deviation in the metric values at the noise level x. For the 684 class-vector metrics, we simulated a dataset with 10 vectors and 2 classes. The mean response 685 of each class was set to 2 and 4, respectively. The value for a class was randomly drawn from 686 ( , ), where is the class mean, = 10 and ∈ [−2, −1.9, −1.8, … , 3] to simulate 687 increasing noise levels on a log scale, covering 5 orders of magnitude. Each simulation was 688 repeated 1000 times, and the resultant similarity was measured using each metric. We 689 repeated the entire experiment with increasing base means, i.e., we added an integer value to 690 the mean response of the classes. over the entire dataset (Figure 2a). To compare the stability of the two metrics for 713 incomplete data, we randomly sampled 70 out of 141 individuals from the dataset. We 714 calculated the PRED and PC value for this subset, repeating the random sampling 20 times. 715 We then calculated the coefficient of variation of each metric over these 20 random 716 samplings. PRED for each of the 37 different glomeruli separately (Figure 3e). For analyzing the odor-individual (class-vector) PRED with temporal responses, we extracted 781 both the firing rate and the temporal response of the neurons for a period of 2-s after odor 782 onset. The firing rate was calculated as the total number of spikes within the 2 second period 783 from 2 to 4 seconds in the response minus the number of spikes in the 2 second period before 784 odor onset, from 0 to 2 seconds in the response. The full dataset consisted of unique connectivity types as classes and the two databases as 843 vectors. The connectivity vector of each neuron within a connectivity type was averaged. 844 Each cell within this matrix was a 57-length vector of averaged and normalized connectivity 845 weights of the corresponding LHN to each glomerulus. We first calculated the connectivity 846 type-database (class-vector) PRED value over this matrix to characterize the similarity of 847 connections across databases. Next, we grouped this matrix based on each of the different 848 hierarchy levels. We averaged the connectivity vectors over all connectivity types belonging 849 to a group within a particular hierarchy to get a matrix with groups as columns and the 850 databases as rows (Figure 8a). We then calculated the group-database (class-vector) PRED 851 values for each hierarchy level based on cell type, tract, or region. 852 In the experiment where we characterized the separability of neurons across groups based on 853 their connectivity to antennal lobe glomeruli, we constructed 4 different matrices with 854 individual neurons (not averaged over connectivity types) as samples and the relevant group 855 types as classes for the two databases separately (Figure 8b). We then calculated the group-856 neuron (class-sample) PRED for each matrix to characterize the separability of neural 857 connectivity vectors across groups for each hierarchy level. 858 859

860
To compare a set of PRED values with the baseline (0) or a specific mean, we used a one-861 sample double-sided t-test. To compare the chance level of the metrics across classes, we 862 used two-sample double-sided unpaired t-tests. For comparing the coefficient of variation 863 obtained for PRED with those for PC, we used a two-sample double-sided paired t-test.