Nonmetric ANOVA: a generic framework for analysis of variance on dissimilarity measures

Classic Analysis of Variance (ANOVA; cA) tests the explanatory power of a partitioning on a set of objects. Nonparametric ANOVA (npA) extends to a case where instead of the object values themselves, their mutual distances are available. While considerably widening the applicability of the cA, the npA does not provide a statistical framework for the cases where the mutual dissimilarity measurements between objects are nonmetric. Based on the central limit theorem (CLT), we introduce nonmetric ANOVA (nmA) as an extension of the cA and npA models where metric properties (identity, symmetry, and subadditivity) are relaxed. Our model allows any dissimilarity measures to be defined between objects where a distinctiveness of a specific partitioning imposed on those are of interest. This derivation accommodates an ANOVA-like framework of judgment, indicative of significant dispersion of the partitioned outputs in nonmetric space. We present a statistic which under the null hypothesis of no differences between the mean of the imposed partitioning, follows an exact F-distribution allowing to obtain the consequential p-value. Three biological examples are provided and the performance of our method in relation to the cA and npA is discussed. Significance Statement The Nonmetric Analysis of Variance (nmANOVA) conveys a framework that allows a compatible type of ANOVA for the cases where the proper metric measurements between objects are either lost, unknown or however inaccessible. While classic ANOVA is based on the measurements of the data from a base datum, the nmANOVA is formulated on the dissimilarity outputs (not necessarily metric) defined between all objects. As the main goal of ANOVA in providing a statistical test for assessing the significance of a considered partitioning on the data, the nmANOVA is yielding a paralleled scheme of inference with 1) accommodating the outcomes dissimilarities into within and between groups statistics, 2) assessing their respective divergence with a parametric distribution, and 3) providing a resultant p-value indicative of evidences fore rejecting the null hypothesis.

measured with metric functions, there are cases for which the 23 use of these functions are not more than a mere coercion. For 24 example, a data transformation of non-metric outputs to Eu-25 clidean metric (12) or by incorporating a deduced metrics from 26 a principal coordinates analysis on the dissimilarity matrix 27 (11), while permits the npA model to operate, is leading to 28 invalid judgments due to compressing and deforming the data 29 beyond sufficiency. The failure to appropriately acknowledge 30 the data inherency, is at the core of the misjudgment based 31 upon the deficient data coercion. Network graphs rising in 32 multiple biological studies for example (13-15), could exhibit 33 the nonsymetrical measurements between the nodes, lack the 34 edge between two, or even contain a self-looped nodes; all of 35 which being cases where the metric properties are negated (S1). 36 Each of these nonmetric scenarios should then be represented 37 with a dissimilarity matrix between the nodes (rather than 38 being metric-forced) and so demand a statistical framework 39 that could incorporate this input as it is. 40 Here, based on the central limit theorem (CLT) and harness-41 ing the natural relationship between statistical distributions 42 (16), we generalize the cA method to a nonmetric case by

Significance Statement
The Nonmetric Analysis of Variance (nmANOVA) conveys a framework that allows a compatible type of ANOVA for the cases where the proper metric measurements between objects are either lost, unknown or however inaccessible. While classic ANOVA is based on the measurements of the data from a base datum, the nmANOVA is formulated on the dissimilarity outputs (not necessarily metric) defined between all objects. As the main goal of ANOVA in providing a statistical test for assessing the significance of a considered partitioning on the data, the nmANOVA is yielding a paralleled scheme of inference with 1) accommodating the outcomes dissimilarities into within and between groups statistics, 2) assessing their respective divergence with a parametric distribution, and 3) providing a resultant p-value indicative of evidences fore rejecting the null hypothesis. 56 In many practical cases the level of dissimilarities expressed be-57 tween items does not follow the metric properties (17). Based 58 on triangle equality, the simultaneous affirmation of all derived 59 equations presented in 10 is necessary for a set of dissimilarities 60 to be defined as metric (18). The negation of each to all of 61 these equations (and hence invalidating the metric definitions) 62 is extensively classified (19). For example, the term semi-63 metric denotes scenarios where the triangle inequality fails to 64 hold in general, while a pseudometric refers to cases where the distribution with g 2 − g and g − 1 degrees of freedom (S1); tioning groups (S1).

114
The Fnm formulated in (2) is encapsulating the ratio of diver-115 gence of the mean of between to within groups. Under null 116 hypothesis of no difference between the mean values of the 117 partitioning groups, the sum of normalized means of dissimi-118 larity values between each partition would be the same as their 119 within counterparts (S2). To the degree of deflection from this 120 ground truth, the observed Fo would be inflated and so the 121 Pr(F g 2 −g,g−1 > Fo) quantity would be the assigned p−value 122 to indicate whether this deflection is of any significance with 123 a designated α−level threshold. Table 1 summarizes the es-124 sentials of this model with respect to the metric ones.

125
General properties of nmA 126 nmA is sensitive to groups differences. To assess nmA perfor-127 mance in different situations, we conduct series of simulations 128 from Gaussian and uniform distributions which cover various 129 scenarios via adjusting distribution parameters. While liter-130 ally any other parametric distribution could be harnessed in 131 here, this choice of models was encouraged as at the constant 132 variance level, the Gaussian distribution is known to have 133 the minimum negentropy (31, 32); hence the most chaotic 134 sequences of values are expected. The uniform distribution on 135 the other hand is chosen as it is usually taken as the state of 136 uninformativeness in the Bayesian scenarios (33).

137
Firstly, we are interested in the performance of the nmA when 138 the null hypothesis is true. Under this condition for any para-139 metric test, the p-values should follow the U (0, 1) distribution 140 (34). In our case, using the same sampling distribution for 141 generation of the dissimilarities with given partition imposed 142 on them led to the consistent conclusion ( Figure 1A).

143
Secondly, moving toward the alternative hypothesis, we inves-144 tigated the sensitivity of the model with respect to the marked 145 changes in the means of partitioning groups as a measure of 146 deviation from the null hypothesis. The incremental shifting 147 of the values of the between groups with respect to the within 148 groups has led to the expected inflation of the Fo statistics in 149 nmA model ( Figure 1B). Despite sensitivity of nmA to the 150 location parameter, the scale changes were not leading to the 151 same significant effect as increasing the standard deviation of 152 between to within groups does not lead to the marked changes 153 in the nmA judgment denoting the robustness of the method 154 with respect to scale dimension of the inputs ( Figure 1C).

155
In contrary to its conventional use, nmA can assist in select-156 ing the most similar sub clustering of the objects aggregated 157 in a dissimilarity matrix. This application of the method was 158 assessed by tailoring the apparent differentiation between the 159 groups and allowing the model to retrieve that ( Figure 2A). To 160 do that, we first simulate a dissimilarity matrix which repre-161 sents 4 groups of different sizes -13, 21, 34, 55. Then we split 162 the matrix into 3 groups via joining the first two clusters, or 5 163 groups via splitting the last partitioning into 2 groups of sizes 164 21 and 34. We fix within and between group dissimilarities dis-165 tributions to N (0, 1) and N (5, 1) respectively. It is important 166 to note that we select the cluster sizes on purpose, as they 167 represent a sub-sequence from a Fibonacci sequence allowing 168 minimum inclusion of the between elements to within when 169 joining or splitting the clusters. Fixing the umber of clusters to 170 3 as described, we obtain clusters of 34, 34 and 55 sizes. This 171 will not only provide relatively similar sizes for the clusters but 172 also guarantee the smallest influence on the between original 173 cluster 1 and cluster 2 dissimilarity values. More precisely, 174

D R A F T
The high level similarity of nmA to cA is its closed form distribution of the Test Statistic leading to the respective P −value, while the matrix structure of the Input is common between npA and nmA, albeit former based on the Distances(D), and Dissimilarities (∆) the latter (S1-S3).

Fig. 1. The consistency of the Fnm under different scenarios. A)
Uniform distribution of the p-values. Uniform distribution of 1,000 p-values for three equal-sized groups of 10 when the whole dissimilarity matrix is coming from U (0, 1) distribution. The results of two-sided Kolmogorov-Smirnov test are highlighted on top. The same behavior was observed with simulation from normal distribution with 4 groups of 10 each and 1,000 replication (p-value = 0.635). B) The effect of progressively changing the mean of the between group matrices distribution on the p-values. We use the previous group dissimilarity matrix as our starting point and continuously change one of the uniform distribution parameters via adjusting ψ factor (as an incremental adjustment of the a and b in U (a, b)). With every step i from [1, 10], we change the between group matrix distributions to be U (0 + ψi, 1 + ψi) and obtain the corresponding p-value. At the same time, we track the effect of the proportion of between group data being affected via ψ parameter. The three scenarios are visualized (from left to right): when only group 1 and group 2 pairwise between group matrices are adjusted; when group 1 and group 2 but also group 1 and group 3 dissimilarity matrices are simultaneously adjusted; when all the 6 possible between group matrices are simultaneously adjusted. The red dots represents each p-value with corresponding ψ and the mean values of them are connected with the solid blue lines. The red dashed line corresponds to the p-value equal to 0.05. C) The effect of progressively changing the standard deviation of the between group matrices distribution on the p-values. While in previous case the difference in between group distributions was triggered via changing the mean of uniform distribution, here we fix the mean but change standard deviation via adjusting the between group distribution according to U (0 − ψi, 1 + ψi). The adjustment is performed for the same three scenarios as previously. As indicated with the lower values (of the mean and medians) and the higher gradient of descent (solid blue lines), nmA is more sensitive with the dislocation of the mean as the first central moment of the values than the dispersal standard deviation parameter.

193
Asymptotic sampling improves accuracy. To assess the conver-194 gence of p-values with increasing number of the proportional 195 samplings (S1), we perform simulations where we collect a se-196 quence of p-values which correspond to the same dissimilarity 197 matrix but with a varying number of proportional samplings. 198 Every p-value in a sequence is obtained with one more pro-199 portional sampling added to the one obtained for a previous 200 p-value. For these simulations we select 3 groups of sizes 10, 20 201 and 30 and sample dissimilarity matrix from U (0, 1) for within 202 group matrices and U (2, 4) for between group matrices. Thus, 203 we introduce a difference between the groups via increasing 204 mean and standard deviation, and expect p-values to reflect 205 it. To check the asymptotic sampling behaviour, we apply 206 nmA to the matrix and increase proportional sampling number 207 from 1 to 2 × 10 6 . As Figure 2B shows, p-values capture the 208 significant difference between the groups and converge to their 209 mean value. This example demonstrates that one can improve 210 the accuracy of a p-value via increasing the number of propor-211 tional samplings while even a small number of proportional 212 samplings could be conductive of a significant difference.

D R A F T
nmA outperforms cA and npA in nonmetric cases. We also 214 consider the comparative performance of cA, npA and nmA. 215 We note that while npA reduce to cA with the Euclidean 216 inputs (9), the relation of the nmA to either cA or npA is 217 not canonical. So any type application of the nmA method 218 on the metric data would not be more than a coercion and 219 hence the results would not be comparable nor conductive of 220 any judgment. Here however, we were interested in examining 221 some of these main data modulation that at least structurally 222 formats the date to be a transferable input across different  is split into two groups: 26 monocots (one seed leaf plants) 251 and 52 eudicots (two seed leaf plants) (Table S3)  The higher order indices of nucleotide sequence similarities 275 (e.g. bit score, E-value (40)) are obtained from an underlying 276 nonmetric nucleotide substitution matrices. In fact, all the 277 nucleotide substitution models that allows for nonsymetric 278 nucleotide substitution rates (F81, HKY85, TN93, and GTR) 279 will essentially result in nonmetric similarity measure between 280 sequences (41). As the metric conditions can be also negated 281 in the case of the codon or amino acids substitution models 282 (such as BLOSUM or PAM), the nmA use can be extended for 283 downstream assessment of partitionings constructed on these 284 models outputs as well (42-45).

285
Drug combination sensitivity scoring for cell lines. Combina-286 tion sensitivity scores (CSS) have been introduced to provide 287 information about drug combination efficacy applied to cancer 288 cells (46). After drugs sensitivity estimation, it is usually of 289 interest to see whether particular drug groups lead to similar 290 sensitivity patterns. The directional nature of the CSS scores 291 as to which drug combination is a baseline, renders the CSS 292 quasimetric. Consequently, nmA can be applied to any parti-293 tioning of drugs to evaluate its significance. In this example, 294 we apply nmA to assess significance of drug clusters obtained 295 from drug combination experiments. We download relative 296 inhibition (RI) scores (which are sensitivity scores in single 297 drug experiments) and CSS from DrugComb portal (47). We 298 filter data to come from O'Neil drug combination study as it 299 contains the most complete dataset (48). The data provides 300 RI and CSS for 38 unique drugs and 39 cancer cell lines (SD2). 301 For each cell line, we obtain dissimilarity matrices for the 38 302 drugs, based on the correlations of their CSS1 or CSS2 values 303 ( Figure S3). Afterwards, we apply hierarchical clustering on 304 the obtained matrices and fix the number of clusters to 2 305 aiming to find the most distinctive drug partitioning. To avoid 306 cases when clustering is governed by a small number of drugs 307 (e.g. due to drugs exceptional sensitivity patterns), we apply 308 nmA to dissimilarity matrices that have at least three drugs 309 in each partition. As a result, we identify 7 cell lines out of 310 39 for which the obtained partitioning is significant (Table 2, 311 Figure S4). For example, the 38 drugs are clearly separated 312 into two clusters for EFM192B cell line ( Figure S3), which can 313 be explained by their mechanisms of actions as the majority 314 of the first group compounds are targeted drugs, meanwhile 315 the second group mostly consists of conventional chemother-316 apy drugs; antineoplastic agents, mitotic and topoisomerase 317 inhibitors (Table S4). The same trend can be seen for the 318 remaining 6 cell lines ( Figure S4), which supports the use of 319 nmA as a tool for assessing the significance of the proposed 320 clustering.

Fig. 2. nmA properties and applications. A)
Distribution of p-values to define optimal clustering. The original dissimilarity matrix clustering of size 4 has been rearranged to size 3 and 5 to demonstrate that nmA's p-value can serve as an indicator for selecting the optimal clustering of the data. Additionally, there has been up to 60 between cluster pairs switches to make the fixed clustering shuffled. The dotted lines represent the average p-values over 1,000 scenarios of switched pairs. The shaded area spans between the minimum and maximum of the p-values obtained for each case. B) The convergence of p-values with asymptotic sampling. The p-values converge to their mean value (highlighted in red) after approximately 10 4 proportional samplings performed. C) nmA is more conservative with metric inputs. Four vectors with 100 elements were created using N (10, 1) for cA and either Euclidean or Canberra distance was applied to them to obtain 400×400 distance matrix for npA. To make nmA applicable, the matrix was transformed via placing the elements of its upper triangular part randomly to the upper and lower triangular parts of a new matrix and taking into account which grouping the elements belong to. The values that has not been imputed in this manner are treated as missing. We further add a small noise from N (0, 5 × 10 −4 ) to all the values of the matrix. After that, the mean of either one, two, or three vectors was progressively increased from 0 to 2. The figure shows the space where a method considers the current data partitioning significant. While cA and npA congruently detect differences at a smaller mean differences, the nmA is exhibiting a more conservative trend. D) nmA outperforms cA and npA in nonmetric cases. The 400×400 matrix with within and between group matrices being from N (30, 1) and N (10, 1) distribution is used for nmA and its the coerced symmetric version for npA. The symmetry was achieved via either taking average, minimum, or maximum of the two symmetric counterpart values of the matrix or utilizing the upper triangular part of the created matrix as the input for npA. Similarly to previous case, we start to increase the mean of either one vs two, two vs three or three vs four between group matrices while keeping the within group matrices the same. This time the result indicates the sharper detection of the differences between the means for the nmA compared with npA. E) MDS bipartitioning of the BLAST data. The MD applied on the distance matrix obtained for every species pair as an average of their two nonsymmetric dissimilarities. The reciprocal BLAST MDS clearly separated the data into monocots and eudicots. nmA confirms this grouping with a p-value = 0.044. F) PCA-based grouping of the bladder gene expression data. Some of the sample groups are separated from others while some are rather mixed. nmA provides a significant p-value = 0.041 as it is sensitive even to one between group difference in the data. grouping. It can be seen that KL-based dissimilarity matrix 346 separates 3 carcinoma groups, namely mTCC, sTCC-CIS and 347 sTCC+CIS, and Normal or Biopsy groups even better than 348 PCA. Therefore, nmA distinguishes both Normal and Biopsy 349 group as significant separation from 3 carcinoma groups (Ta-350 ble S5). At the same time, neither any bipartitions within 351 3 carcinoma groups nor Normal versus Biopsy partitions are 352 found to be significant (Table S5). The initial partitioning of 353 the bladder gene expression data into 5 groups is significant 354 with p-value of 0.041. Similarly to cA, nmA test provides a 355 significant p-value if at least one distinct bipartition is present 356 in the data.

358
We introduced a generic type of nonmetric ANOVA method 359 that is based on the dissimilarities between a set of objects. 360 These dissimilarities are not needed to be defined with a metric 361 function nor produced systematically. As long as a numeric 362 value representing the closeness of a set of items subject to 363 different partitions are given, the nmA could be applied for 364   Comput. 13, 163-167 (2003). of all partition means, the Fnm as presented below is following 552 the F -distribution with g 2 − g and g − 1 degrees of freedom;

D R A F T
where SN MB δ and SN MW δ are the sums of squared normal-555 ized means of dissimilarity values between and within parti-556 tioning groups defined as; whereδ W is the overall mean values of all the dissimilarity 559 values in the partitioning groups α β δ αβ ϵ αβ / α β ϵ αβ , 560 ϵ is an indicator function defined in 13, N of f is the number 561 of elements in all off-diagonal sub-matrices and N diag is the 562 number of elements in all diagonal sub-matrices . 563 S2. Proof of Fnm distribution. Based on the central limit theo-564 rem the mean of n 2 j unknown independently distributed δ αβ 565 with population mean µ and variance σ 2 is normally dis-566 tributed hence, Normalization of these values lead to, Now under null hypothesis of the sameness of the competing 571 groups mean with each other H0 : ∀j, j ′ ∈ (1, . . . , g), µ jj ′ = 572 µ, and henceδ11 =δ12 =δ12 =δ22, . . . ,δgg =δW , the sum of 573 squared values of the above would be distributed as the χ (g−1) 574 as, This is true since the sums of squared g standard normal 577 random variables is Chi-squared distributed.

578
Following the same logic for dissimilarities between groups 579 we have, where in that tacitly we have used the equation of the mean 582 values of the dissimilarities between groups to be equal with 583 that one as the within groups, and for the same reason albeit 584 unlike χ W , we are not losing an extra degree of freedom in 585 χ B . Again with noticing that the division of two Chi-squared 586 distributed divided by their corresponding degrees of freedom 587 is F -distributed we have or with rearranging of the values in the above we have, , [9] 591 that with crossing out the 1/σ 2 and g − 1 the (2) is obtained. 592 Note that the Fnm is capturing the amount of noncentrality 593 incurred by degree of deflection of the between mean dissimilar-594 ities in contrast to the within. To the extend of this deflection, 595 Fnm would be proportionally inflated and if such observed 596 value Fo is more than certain degree (Pr(F g 2 −g,g−1 > Fo)), it 597 stands as the existence of evidences for significant differences 598 between the competing groups.  assumptions and the available information from the experi-645 ment; assuming the same independent variable, cA analyzes 646 response values themselves while the npA analyzes pairwise 647 distances. The latter is more suitable for cases where defining 648 an absolute value for the responses is much more difficult than 649 defining their relative order.

650
npA results by using the fundamental relationship under-651 lying the topology of a set of points. Specifically, the sum of 652 squared distances between points and their centroid is equal 653 to the sum of squared inter-point distances divided by the 654 number of points. For the set of N elements this means 655 that SST = SST d E , where SST d E is the mean of the sum of 656 squared Euclidean distances. Analogous to cA, it is possible 657 to partition the sum to between and within sums of squared 658 distances, The values of the latter 659 formula are equal to their counterparts in (11). The result 660 generalizes to any general metric function d, resulting in the 661 decomposition formula (11) where ηα takes the nj value if the αth observation is in the 664 jth group and ϵ αβ is equal with 1 if both α and β observations 665 are in the same group and 0 otherwise. Similarly SSB d and 666 SSW d denote the between and within means of sums of square 667 distances. Note that deriving the SSB d is not straightforward 668 but one can simply make use of the decomposition rule of 669 sums of squares and calculate it as above.

670
3. Hypothesis testing. In cA, significance of a given partition-671 ing is tested by comparing the averaged ratio of the variance 672 between to within, hence the F -statistic: which under null hypothesis follows F (x)g−1,N−g distribution. 675 We refer to the values of test statistic based on the observed 676 outcomes of the experiments as Fo. This value then will be 677 compared against the null F distribution to obtain the P -value, 678 P = Pr(Fg−1,N−g > Fo). Rejection of the null hypothesis 679 indicates the presence of at least one group with mean value 680 differing from the others. With the similar indication for the 681 p-value, in npA a pseudo permutative F -statistic is defined as 682 Indexing each values of pFnp obtained from π permuta-684 tion as pF π np the P -value can be obtained as P -value CSS-based matrix with rows and columns corresponding to the drugs. At the same time, we fill the diagonal values with RI scores, as they bring additional information about the drugs' sensitivity. Then obtaining two directional correlation-based matrices; we subtract the absolute values of a correlation matrix applied to either CSS-based matrix or to its transpose version, from 1. After that, we create a joint matrix via filling upper and lower triangular parts of the new matrix with the upper triangles of the obtained symmetric matrices. Thus, we take into account correlation-based drug similarities as nonsymmetric. We also fix diagonal values of the matrix to zero given that the row and column drug is the same. The obtained EFM192B specific bipartitioning, which is marked by the two black lines on the dissimilarity matrix based heatmap, resulted in p-value = 0.034.