GENVISAGE: Rapid Identification of Discriminative and Explainable Feature Pairs for Genomic Analysis

Motivation A common but critical task in genomic data analysis is finding features that separate and thereby help explain differences between two classes of biological objects, e.g., genes that explain the differences between healthy and diseased patients. As lower-cost, high-throughput experimental methods greatly increase the number of samples that are assayed as objects for analysis, computational methods are needed to quickly provide insights into high-dimensional datasets with tens of thousands of objects and features. Results We develop an interactive exploration tool called Genvisage that rapidly discovers the most discriminative feature pairs that best separate two classes in a dataset, and displays the corresponding visualizations. Since quickly finding top feature pairs is computationally challenging, especially when the numbers of objects and features are large, we propose a suite of optimizations to make Genvisage more responsive and demonstrate that our optimizations lead to a 400X speedup over competitive baselines for multiple biological data sets. With this speedup, Genvisage enables the exploration of more large-scale datasets and alternate hypotheses in an interactive and interpretable fashion. We apply Genvisage to uncover pairs of genes whose transcriptomic responses significantly discriminate treatments of several chemotherapy drugs. Availability Free webserver at http://genvisage.knoweng.org:443/ with source code at https://github.com/KnowEnG/Genvisage

1 Introduction 1 heuristically return more interpretable feature pairs. However, these heuristic methods 26 do not fully explore the search space nor do they offer a guarantee on the quality of the 27 returned feature pairs. 28 Challenge 2: Scalability in Data Size. A major problem with current methods 29 that address the separability problem with either feature pairs or more complex 30 machine learning models is that they do not scale to the growing size of genomic data 31 sets. As is often the case with genomics, the biological objects being analyzed (e.g., 32 tissue samples or drug experiments) are frequently represented by high dimensional 33 numeric feature vectors (e.g., transcript abundance measurements). Additionally, with 34 the rise of low-cost sequencing, the possible number of biological objects in a dataset is 35 also increasing and likely to grow in orders of magnitude over the next decade [17]. 36 Applying the standard methods to datasets with tens of thousands of objects and 37 features results in massive running times that preclude interactive exploration of the 38 data. For example, exhaustively searching for the optimal feature pairs from the full 39 space of possibilities in a typical genomic analysis resulted in running times over an 40 hour on a 200 node compute cluster in Watkinson et al. [9]. 41 One reason that more complex machine learning and feature pair based methods do 42 not scale well with the number of features and objects is the the selection of the metric 43 for scoring separability. In Watkinson et al. [9], a metric called synergy is proposed for 44 evaluating the utility of feature pairs, aiming to capture both linear and non-linear enable this scalability, GENVISAGE focuses on returning the top ranking feature pairs 61 that discriminate the objects of separate classes, rather than returning larger subsets of 62 features using more complex and longer to train machine learning approaches. 63 GENVISAGE is also based around a linear separability metric that provides an intuitive 64 interpretation to feature pairs while enabling and simplifying the design of several 65 important performance optimizations. These optimizations include (a) elimination of 66 repeated computation for different features pairs; (b) pruning poor ranking pairs during 67 early execution; (c) sampling with a quality guarantee to further reduce running time; 68 and (d) cleverly traversing the search space of feature pairs for improved efficiency. 69 We applied GENVISAGE to two large genomic datasets with tens of thousands of 70 objects and high-dimensional feature vectors where it is computationally expensive to 71 score the separability for all possible feature pairs. In one, called LINCS, we find pairs 72 of genes whose expression discriminates between perturbagen experiments involving 73 different drug treatments, and in the other, called MSigDB, we find pairs of 74 annotations (such as pathway membership) that separate differentially expressed cancer 75 genes from other genes. With the carefully designed separability metric of GENVISAGE 76 and its suite of sophisticated optimizations that accelerates evaluation, we are able to 77 accurately return the highest ranking separating feature pairs for both datasets within two 78 minutes on a single machine. This reflects a 180X and 400X speedup over a 79 competitive baseline for the MSigDB and LINCS data sets (respectively). We also 80 show that the feature pairs identified by GENVISAGE often more significantly 81 discriminate between the object classes than the corresponding best ranking individual 82 features, even after accounting for the larger search space. Finally, we performed an 83 in-depth analysis for nine distinct drug treatments in the LINCS dataset and found 84 1070 feature (gene) pairs that had significant separability scores. These gene pairs were 85 enriched in literature support for known relationships between the genes and the drug, 86 as well as known interactions between the genes themselves. 87 Summarized Benefits of Using GENVISAGE. By focusing on separating feature 88 pairs, GENVISAGE offers researchers the ability to gain additional insight into their 89 object classes beyond singular features, without the prolonged duration needed to train 90 a complex machine learning model. By implementing optimizations that take advantage 91 of a linear separability metric, GENVISAGE enables researchers to quickly explore their 92 data, identify the strongest, most compelling features, and from simple visualizations 93 form hypotheses about the interplay between features and with the object classes. The 94 performance of our tool also allows researchers to investigate multiple definitions of the 95 object classes and investigate alternative hypotheses interactively on the fly, as well as 96 build a feature set to later pass to more in-depth, longer running machine 97 learning-based analysis. 98

99
We begin by formally defining the separability problem, introducing our separability 100 metric, and finally detailing optimizations that enable the rapid identification of the 101 best separating feature pairs. Let M be a feature-object matrix of size m × N , where each row is a feature and each 104 column is an object as shown in Figure 1. One example feature-object matrix is one 105 where each object corresponds to a tissue sample from a cancer patient and each feature 106 corresponds to a gene, where the (i, j) th entry represents the expression level of the i th   Workflow. Given (left) a feature-object matrix and green positive and red negative class labels on the objects, GENVISAGE (center) evaluates all pairs of features using several optimizations to identify (right) the top feature pair and its corresponding visualization that best separates the object classes.
We are also given two non-overlapping sets of objects, one with a positive label, O + 111 and the other with a negative label, O − . In our example, tumor samples, O + , may be 112 assigned the positive label, and the healthy tissue samples, O − , the negative label. The 113 number of labeled objects, n, is equal to

115
GENVISAGE aims to find feature pairs that best separate the objects in O + from 116 those in O − using only those features, and then output a visualization that 117 demonstrates the separability (see Figure 1). (We will define the metric for separability 118 subsequently.) A feature pair that leads to a good "visual" separation between the 119 positive and the negative sets may be able to explain or characterize their differences 120 via a interesting, non-trivial relationship among the features. The overall workflow is 121 depicted in Figure 1. We now formally define the separability problem.

122
Problem 1 (Separability). Given a feature-object matrix M and two labeled object sets 123 given separability metric. 125 We will describe our separability metric in Section 2.2, and then discuss optimization 126 techniques in Section 2.3. The notation used in the description of the method is 127 summarized in Supplementary Table B corresponds to an intuitive 2-D visualization. Given a feature pair (f i , f j ) and a line , 137 we can predict the label of an object o k , denoted as η ,k i,j , using Equation 1 below, where 138 w 0 , w i and w j are coefficients of and w j > 0: If o k lies above the line , i.e., o k has higher value on y-axis than the point on line 140 with the same value on x-axis as o k , then η ,k i,j = 1; otherwise, η ,k i,j = −1. Let θ ,k i,j be the 141 indicator variable denoting whether the sign of the predicted label matches the real 142 label l k : if η ,k i,j · l k = 1, then θ ,k i,j = 1; otherwise, θ ,k i,j = 0.  (f i , f j ) and a line , the separability score of the line (denoted θ i,j ) is defined as the sum 146 of the indicators (θ ,k i,j ) for all objects:  Rocchio-based Measure. We can speed up the process by selecting a single 159 representative line L providing us with an estimate of the true separability score θ i,j . In 160 order to achieve a fast and reliable estimate, we select our representative line based on 161 Rocchio's algorithm [22]. Let us denote the centroids of positive objects O + and ).

169
Brute-force vs. Rocchio-based. Compared to the brute force calculation, the 170 Rocchio-based measure is much more light-weight, but at the cost of accuracy in 171 calculating θ i,j . Intuitively, the representative line is a reasonable proxy to the best 172 separating line since the Rocchio-based measure assigns each object to its nearest 173 centroid. We further empirically demonstrate that θ L i,j is a good proxy for θ i,j in 174 Section 3.2. Thus, we will focus on the Rocchio-based measure subsequently, removing 175 L (or ) from the superscripts where it appears, and using θ i,j and θ L i,j interchangeably. 176 combination-however, in reality, careful engineering is necessary to "stitch" these 191 modules together to multiply the effects of the optimizations.

Early Termination
Given a feature pair (f i , f j ), we need to scan all the 217 objects to compute the separability score θ i,j . However, since we only need to identify 218 feature pairs in the top-k, we can stop for each feature pair as soon as we can make that 219 determination, without scanning all objects; we call this the EARLYSTOP module. High Level Idea. We maintain a upper bound τ for the separability error err i,j of the 221 top-k feature pairs. Then, the lower bound of the separability score can be denoted as 222 (n − τ ). Given a feature pair (f i , f j ), we start to scan the object list until the number of 223 incorrectly classified objects exceeds τ . If so, we can terminate early and prune this  EARLYSTOP module, the transformed M scores are scanned left to right and each incorrectly classified object is marked in blue. Without object ordering (above), evaluation terminates after five checked objects. When objects are reordered by the most "problematic" (below), the feature pair is rejected after checking only the first two objects. (b) To calculate the top-3 feature pairs with SAMPLING, the confidence interval of θ i,j is calculated for every feature pair evaluated on the sample set S (above). The 3 rd interval lower bound ζ is obtained (red dotted line), and all feature pairs with a larger upper bound are designated as candidates for validation (blue intervals). The selected candidates (center box) are evaluated on the whole object setÔ to compute the exact θ i,j and pick the top-3 (right box).
Enhancement by Object Ordering. Although EARLYSTOP has the potential to 227 always reduce the running time, its benefits are sensitive to the ordering of the objects 228 for evaluation. Since we terminate as soon as we find τ incorrectly classified objects, we 229 can improve our running time if we examine "problematic" objects that are unlikely to 230 be correctly classified relatively early. For this, we order the objects in descending order 231 of the number of single features f i that incorrectly classify the object o k , i.e., M i,k ≤ 0. 232 Thus, the first object evaluated is the one that is incorrectly classified by the most single 233 features. The benefit of this strategy is illustrated with an example in Figure 3(a). 234

Sampling-based Estimation
One downside of the EARLYSTOP module 235 is that the improvement in the running time is highly data-dependent. Here, we propose 236 a stochastic method, called SAMPLING, that reduces the number of examined objects. 237 Instead of calculating θ i,j over the whole object setÔ, SAMPLING works on a sample 238 set drawn fromÔ.

239
High Level Idea. SAMPLING primarily consists of two phases: candidate generation 240 and validation (Figure 3(b)). In phase one, we estimate the confidence interval of θ i,j 241 for each feature pair using a sampled set of objects and generate the candidate feature 242 pairs for full evaluation based on where their confidence intervals lie. If the confidence 243 interval overlaps with the score range of the current top-k, then it is selected for 244 evaluation. In phase two (lower half of Figure 3(b)), we evaluate only the feature pairs 245 in the candidate set, calculating θ i,j over the whole object set,Ô, to obtain the final 246 top-k feature pairs. Unlike our previous optimizations, SAMPLING returns an 247 approximation of the top-k ranking feature pairs.

248
Candidate Generation. Let S be a sample set drawn uniformly fromÔ. Given a 249 feature pair (f i , f j ), let θ i,j (S) be the number of correctly separated objects in S. We 250 can estimateθ i,j from θ i,j (S) usingθ i,j = θi,j (S) |S| · n by assuming the ratio of correctly 251 separated samples in S is the same as that inÔ. Using Hoeffding's inequality [23], we 252 have that by selecting Ω( 1 2 ) samples, that θ i,j is in the confidence interval the sample size |S| is independent of the number of objects, this module helps 255 GENVISAGE scale to datasets with large n.

256
Following the top half of Figure 3(b), we can first calculate the confidence interval of 257 θ i,j for each feature pair (f i , f j ). Next, we compute the lower bound of θ i,j for the top-k 258 feature pairs, denoted as ζ as shown by the red dotted line. Finally, we can prune 259 feature pairs away whose upper bound is smaller than ζ, keeping the candidate set C of 260 feature pairs depicted by blue intervals. These feature pairs C will be further validated 261 in phase two, i.e., candidate validation. Typically, |C| will be orders of magnitude 262 smaller than m 2 , the original search space for all feature pairs.

263
Candidate Validation. We re-evaluate all of the candidates generated from phase one 264 to produce our final feature pair ranking. This evaluation is performed using the whole 265 object setÔ and the top-k feature pairs are reported (lower half of Figure 3 Note that ζ is increased in (d) after evaluating the third feature pair and since ζ is larger than the upper bound of the fourth feature pair, candidate validation can terminate and return the top ranking pairs.

Search Space Traversal
The optimizations discussed so far check fewer  3 Results

291
In this section, we illustrate that GENVISAGE rapidly identifies meaningful, significant, 292 and interesting separating feature pairs in real biological datasets. First, we describe the 293 datasets and the algorithms used in our evaluation. Each algorithm that we evaluate 294 represents a combination of optimization modules for ranking top-k feature pairs using 295 our Rocchio-based measure-we report the running time and accuracy of the algorithms. Datasets. We consider datasets from two biological applications (see Table 1): (a) in 302 MSigDB, we find gene annotations such as pathways and biological processes that 303 separate the differentially expressed genes from the undisturbed genes in specific cancer 304 studies; (b) in LINCS, we find genes whose expression levels can distinguish 305 experiments in which specific drug treatments were administered from others.

306
In MSigDB, we are given a feature-object matrix with genes as the objects and MSigDB are the set of differentially expressed genes (DEGs) in a specific cancer study 314 downloaded from the Molecular Signatures Database (MSigDB) [25]. Each of our tests 315 is an application of GENVISAGE to such a dataset, reporting pairs of properties that 316 separate DEGs of that cancer study (the "positive" set) from all other genes (the 317 "negative" set).

318
In LINCS, the feature-object matrix contains expression values for different genes 319 (features) across many drug treatment experiments (objects) conducted on the MCF7 320 cell line by the LINCS L1000 project [26]. The values of the matrix are gene expression 321 values as reported by the "level-4' imputed z-scores measured in the L1000 project. In 322 each dataset, the positive object set includes multiple experiments that used the same 323 drug, at varying dosages and for varying durations. We applied GENVISAGE on each 324 dataset so as to find the top pairs of genes (feature pairs) whose expression values

327
Note that the average number of positive objects in any dataset is far fewer than the 328 average number of negative objects. To address this imbalance, we adjust θ i,j to a 329 weighted sum form: Algorithms. We evaluated six combinations of our optimization modules from 331 Section 2.3, listed in Table 2 Table 2. Selected Algorithms Using Different Optimization Modules. All algorithms, including the Baseline, are using TRANSFORMATION. In addition, EARLYSTOP and TRAVERSAL are coupled with object ordering and feature ordering by default, respectively. For each algorithm (row), shows which optimization modules are employed, whether the algorithm is returning the exact answer or an approximation answer, and the running time complexity for that combination. The term "guarantee" ("heuristic", resp.) indicates that the returned answer is with (without, resp.) stochastic guarantee. In addition, m and n are the number of features and objects, S is the sampled set size, χ is the limit on the number of feature pairs considered, and C is the number of generated feature pair candidates.
column of Table 2

345
In this section, we first justify that Rocchio-based measure is a good proxy for the best 346 possible separating score computed by a brute force method. Then we compare the 295 objects for each comparison. We call the brute force-based separability score the 360 true separability score, since it examines all possible separating lines. We first find the 361 best feature pair using Rocchio-based measure and the brute force based measure 362 separately (potentially different feature pairs) and then calculate the ratio of the true 363 separability scores of the Rocchio versus the brute force best feature pairs. We observe 364 that the Rocchio-based method picks a best feature pair that has true separability score 365 similar to the best pair picked by brute force, with the ratio of the two scores being 366 better than 0.94 in all ten datasets (Supplementary Figure C.3 (a)). Second, for the 367 best feature pairs identified by Rocchio-based method for the ten datasets, we calculate 368 the ratio of the Rocchio-based separability score and the brute force-based separability 369 score, and find the difference to be greater than 0.96 on average (Supplementary 370 Figure C.3 (b)).

371
Running Time. Figure 5 depicts the running times of our different selected algorithms. 372 Each plotted box corresponds to one algorithm, representing the distribution of running 373 times for finding the top-k feature pairs (by Rocchio score) for all datasets.   Baseline and the given algorithm. Figure 6 shows this separability quality comparison. 415  Takeaways. If the accuracy is paramount, SampOpt is recommended; if the running 431 time is paramount to the user, HorizSampOpt is recommended.

433
In this section, we quantify the statistical significance of the top ranking results of the 434 selected algorithms. We show that we often find separating feature pairs that are more 435 significant than the best single separating feature. To assess the significance of a 436 separating feature or feature pair, we first calculate the p-value using the one-sided 437 Fisher's exact test on a 2 × 2 contingency table. This contingency table is constructed 438 with the rows being the true positive and negative labels, the columns being the 439 predicted positive and negative labels, and the values being the number of objects that 440 belong to each table cell. Using the Fisher's exact test p-value, we assert that feature For each test (x-axis), shows the significance (− log 10 (corrected pval)) of the top-100 best single features (grey dots) and feature pairs (blue dots) for the (a) MSigDB and (b) LINCS datasets. We order the datasets by their best corrected single feature p-value, and discard the datasets where no single feature has corrected p-value better than 10 −5 .
Feature Pair. We next build the contingency tables and calculate the p-value for the 458 top-k feature pairs. To correct for m 2 possible feature pairs and the n 2 possible ways to 459 choose the separating lines for each feature pair, we apply a Bonferroni p-value 460 correction to produce the corrected pval = pval × m 2 × n 2 . We plot the distribution of 461 the corrected p-values for the top-k feature pairs in Figure 7. Once again, the threshold 462 for defining a significant feature pair is set to 10 −5 . We find that 10 out of 10 datasets 463 in MSigDB and 27 out of selected 32 datasets in LINCS have at least one significant 464 feature pair by this metric. Visual comparison of the top-100 single features to the 465 top-100 feature pairs (Figure 7) per dataset reveals several datasets where the corrected 466 p-values of the feature pairs are more significant than those of the best single features, 467 even after accounting for the larger search space. Admittedly, this is not always the 468 case, e.g., for five LINCS datasets no feature pair was found to be significant at 469 corrected pval ≤ 10 −5 while at least one single feature did meet this threshold. Overall, 470 this analysis suggests that rapid discovery of top feature pairs may identify more 471 significant patterns in the given dataset than a traditional single-feature analysis does. 472 In the following, we further illustrate that feature pairs can also provide better and 473 newer insights compared to single features.

499
For all drugs, except pravastatin, all of the top-1000 ranked feature pairs were found 500 to be significant, i.e. − log 10 (corrected pval) > 5 (see Table 3). As described in the 501 Section 3.3, we are especially interested in feature pairs whose corrected p-value is 502 better than the corrected p-values of their corresponding single features 503 (− log 10 (improv quot) > 0). We found 1070 "improved" feature pairs with larger 504 separability over their single feature among the top1000 of these evaluation drug sets.

505
One drug, trichostatin, had especially strong single features and showed no feature pairs 506 that significantly improved on them. The remaining seven drugs, however, benefited 507 from the feature pair analysis yielding between 9 (tamoxifen) and 369 (doxorubicin) 508 improved feature pairs (Table 3).

509
Many of the above-mentioned 1070 significantly improved feature pairs are partially 510 redundant, in the sense that they comprise a common best-ranked single feature (gene). 511 An example of this is with the object set for the drug (small molecule) estradiol. We 512 found the gene PRSS23 as the single feature with the highest separability and many 513 feature pairs containing PRSS23 and a second gene as having an improved corrected 514 p-value, for example (PRSS23, RAP1GAP), (PRSS23, TSC22D3), and (PRSS23, 515 BAMBI). We looked for evidence of the relationship between the drug estradiol and 516 these feature pair genes in the Comparative Toxicogenomics Database (CTD) [27] and 517 with our own literature survey. From this search, we found evidence for the pronounced 518 effect of estradiol in increasing expression levels of PRSS23 [28], RAP1GAP [29], and 519 Table 3. For each chosen drug from LINCS, the number of experiments in MCF7 cell line that were performed with that drug (NumExprs), and statistics for the top1000 feature pairs for that drug including the average − log 10 (corrected pval) (Avg Signif), number of feature pairs with − log 10 (corrected pval) > 5 (Top1000 Signif), and number with − log 10 (improv quot) > 0 (Top1000 Improved).
BAMBI [30], and decreasing expression of TSC22D3 [31]. So although the top single 520 feature (gene PRSS23) reoccurred in multiple top feature pairs, each secondary feature 521 gene was also meaningfully related to the administered drug in this case. 522 We next examined the 1070 improved feature pairs, found over the 9 LINCS second most improved feature pair for doxorubicin. This feature pair also has all three 551 types of accompanying evidence; doxorubicin is known to increase expression of 552 CDKN1A and CEBPB [34], and the pair of genes are annotated in STRING to have 553 evidence for co-expression and text mining relationships. This feature pair can be used 554 to form an interesting hypothesis for further analysis or experiment. The potential for 555 finding more significant and previously unidentified features is why GENVISAGE is 556 designed to recover top ranking feature pairs instead of just single features. 557

558
As discussed in Section 1, the output of GENVISAGE is not simply a ranking of the top 559 feature pairs with their scores, but also a visualization that helps users to interpret the 560 separability. In Figure 8, we depict sample output visualizations from the MSigDB and 561 LINCS runs. For MSigDB, we select the feature pair with the highest improved 562 p-value, i.e., improv quot, using the SampOpt algorithm. For our LINCS 563 representative, we visualize the gene feature pair (GLRX, NME7) for the drug vorinostat 564 as described in the previous section. For the MSigDB example (Figure 8(a)), we 565 observe that the feature values for negative objects are clustered around zero, while the 566 genes differentially expressed in papillary thyroid carcinomas from this MSigDB study 567 have larger values overall, indicating stronger connections to the two Gene Ontology 568 terms features, cell adhesion and response to reactive oxygen species. This is consistent 569 with studies that have highlighted the over expression of important cell adhesion genes 570 in thyroid cancer [35]. For the LINCS example (Figure 8(b)), positive objects mostly 571 have elevated expression for the two reported genes (GLRX and NME7) compared to 572 the negative objects. The direction of this differential gene expression for both genes is 573 consistent with literature for vorinostat experiments [32], [33]. These above two 574 examples illustrate how visualization of significant feature pairs can be a useful way to 575 explain the separability of object sets and understand the data.  like TRANSFORMATION that can pre-compute important quantities from the 588 feature-object matrix before the positive and negative object sets are even provided.

589
One potential downside of the Rocchio-based measure is that because of its dependency 590 on linearity, feature pairs with distinct object class distributions that form complex, 591 non-convex, non-isotropic patterns are potentially very interesting, but will not be 592 well-ranked by GENVISAGE. Finally, in GENVISAGE, the optional SAMPLING 593 module and TRAVERSAL modules make stochastic or greedy decisions in order to 594 estimate the quality of and prune the potential candidate feature pairs for evaluation.

595
While this greatly benefits the amount of time required to find the top ranking pairs, it 596 has the potential to do so at the cost of ranking accuracy. Overall, we observed that for 597 our settings, the sacrifice in accuracy was slight for the SampOpt feature pair rankings 598 and more substantial when using the HorizSampOpt and VertSampOpt rankings 599 with the greedy candidate traversal. However, users of GENVISAGE are able to 600 optimize the trade-off with performance and accuracy by modifying the sample size, |S|, 601 used by the SAMPLING module or the number of candidate feature pairs examined, χ, 602 by TRAVERSAL module depending on the needs of their research and dataset.

603
Funding 604 This work was supported by the National Institutes of Health Big Data to Knowledge 605 (BD2K) initiative [1U54GM114838, 3U54EB020406-02S1], the National Science homology from several databases. Our gene annotations and gene properties were extracted from Gene Ontology terms [36], PFam domains [37], and Reactome [38] and KEGG [39] pathways. We constructed a heterogeneous network with nodes for all 22,210 genes and 21,235 properties from these databases and with edges representing their annotations between genes and properties. We also created weighted homology-based edges in the network between pairs of genes based on their protein sequence similarity as determined by BLAST scores [40]. We used the first phase of the DRaWR algorithm [24] with a restart probability of 0.5 to perform a random walk restarting from each gene node on the heterogeneous network, thereby scoring the connectivity of all nodes in the network to the gene. For each gene-property pair (g, r), we assigned the numeric value from the random walk stationary probability distribution that represents not only whether the gene is annotated with that property, but also whether other genes closely related to gene g are annotated with property r. We thus obtained a feature-object matrix describing each gene (object) as a vector of its strength of association with each property (feature) in light of prior biological knowledge.

A.4 Speedup Analysis for LINCS
In Figure 5(b), we observe over 400× average decrease in the running time of finding the top-k feature pairs that separate the LINCS experiments of a single perturbagen from others. The greatest speedup comes with adding the SAMPLING module, where only 100K feature pair candidates, i.e., |C|, are checked out of all 250M feature pairs (Supplementary Table B.2). For the selected algorithms with best running times, HorizSampOpt and VertSampOpt, the pre-transformation and feature ordering overhead account for an average of 45 + 35 = 80s of the overall 104 and 94 median seconds respectively.

A.5 Gene Interaction Datasets
For our knowledge bases of protein and gene interactions, we downloaded datasets derived from 8 data sources: STRING [41], Reactome [42], Pathway Commons [43], HumanNet [44], BioGRID [45], Intact [46], DIP [47], and BLAST [40]  total number of feature pairs. We order the single features by their individual separability scores. In horizontal traversal, only feature pairs with at least one individual feature ranked in the top 500 will be considered; while vertical traversal will consider only feature pairs with both individual features ranked better than 2000.

C.3 Separability Score Comparison
Comparison of Brute Force-based and Rocchio-based separability score. (a) For each of 10 datasets, we display the ratio of the true separability score between the best feature pair chosen by brute force and by the Rocchio-based method. (b) For each dataset, we display the ratio of the true separability score and the Rocchio-based separability score for the best feature pair selected using Rocchio-based method.