CONE: COntext-specific Network Embedding via Contextualized Graph Attention

Human gene interaction networks, commonly known as interactomes, encode genes’ functional relationships, which are invaluable knowledge for translational medical research and the mechanistic understanding of complex human diseases. Meanwhile, the advancement of network embedding techniques has inspired recent efforts to identify novel human disease-associated genes using canonical interac-tome embeddings. However, one pivotal challenge that persists stems from the fact that many complex diseases manifest in specific biological contexts, such as tissues or cell types, and many existing interactomes do not encapsulate such information. Here, we propose CONE 3 , a versatile approach to generate context-specific embeddings from a context-free interactome. The core component of CONE consists of a graph attention network with contextual conditioning, and it is trained in a noise contrastive fashion using contextualized interactome random walks localized around contextual genes. We demonstrate the strong performance of CONE embeddings in identifying disease-associated genes when using known associated biological contexts to the diseases. Furthermore, our approach offers insights into understanding the biological contexts associated with human diseases.


Introduction
The proper operation of cells depends on the precise coordination and interaction of biological entities, such as genes, RNA, and proteins.As a result, complex human diseases are the ramifications of perturbations to groups of genes that give rise to pathological states [1,2].Leveraging this interdependence among biological entities, network-based methods have shown great promises in unveiling human genes' function [3] and their associated diseases [4,5].Recent approaches achieved this by training machine learning models using network embeddings extracted from the input gene network [3,6].However, a crucial limitation remains as many biological network embedding methods do not consider the differences induced by various biological contexts.
The interacting relationships among genes vary across biological contexts, such as tissues, cell types, or disease states.Many human genes operate in a tissue-dependent manner.For example, DMD is preferably expressed in muscle [7].The heterogeneity of the specific set of genes expressed by a particular biological context contributes significantly to the phenotypic diversity within an organism, aiding the complex, specialized functions required for its survival [8].Consequently, the dysfunction of genes ultimately leads to diseases manifesting only in specific tissues.For example, Mendelian disorders show clear tissue-specific manifestation and complex diseases have a strong tendency of tissue selectivity, such as neurological disorders and cardiovascular diseases [9].However, tissuedisease associations are not always straightforward.Apart from the primary affected tissues, diseases may affect seemingly unrelated tissues.One prime example is the high risk of gastrointestinal tract dysfunction observed in patients of Parkinson's disease, a neurological disorder primarily centered in the brain [10].The cryptic connections between diseases may be partially explained by shared underlying mechanisms among them, which could be well characterized by networks [11].
Numerous functional genomics projects generate data comprising diverse types, qualities, and scopes of genes or molecules [12][13][14].To obtain a high-quality and comprehensive network embedding, several methods are developed to infer a joint network representation by integrating multiple networks [15][16][17][18][19].However, a drawback of network integration methods is that the integration process can eliminate context-specific information in each input network, resulting in a context-naive network.
In other words, it may assume the same molecular interactions in the kidney and brain, whereas in reality, interactions are tissue-specific.To predict a range of tissue-specific gene functions or genedisease relationships, we must integrate context-specific information into the network.Furthermore, state-of-the-art data integration approaches using graph neural networks [16] may not scale well to the number or size of the networks.Therefore, we require a scalable method that can handle networks of varying sizes and numbers.
Contributions.Here, we address the critical need for a versatile and scalable method for generating biological context-specific network embeddings.We summarize our main contributions as follows.
1. We propose CONE, a versatile contextual network embedding method that takes context definitions in the form of node sets.
2. The proposed method operates on a shared graph attention network across all contexts, which is contextualized by conditioning on the raw embeddings.This results in a model that scales practically independent to the number of contexts.
3. Through a series of experiments, we demonstrate the value of injecting various biological contexts to improve disease gene prioritization.

Related work
A few studies have explored the idea of contextualizing biological network embeddings using contexts such as tissue or cell type specificity.Notably, OhmNet [20] pioneered the tissue-specific gene interaction network embedding by leveraging the hierarchical relationships between different tissue levels and genes.OhmNet learns a multi-layer embedding and operates on the idea that closely related tissues, or layers, should have similar embeddings.However, the original OhmNet method requires a highly specific construction of the hierarchical multi-layer tissue-specific networks, making it hard to extend to broader biological contexts readily.More recently, PINNACLE [21] further expanded the biological contexts into finer-grain definitions based on cell types using cell-typeexpressed genes constructed from the Tabula Sapiens single cell atlas [22].Furthermore, PINNACLE learns context-specific graph attention modules with independent parameters per context.

Network Embedding via Sampling
Let G = (V, E, w) be a weighted undirected graph, where the edge weight function w : V × V → R, and denote its corresponding adjacency matrix by A ∈ R |V |×|V | .The goal of a graph embedding method aims to find the mapping f : V → R d that maps each node v ∈ V to a d-dimensional embedding space by minimizing the following objective function.
where S + and S − are positive and negative edge sampling functions.Particularly, the Singular Value Decomposition (SVD) can be viewed as the optimization of equation 1, where S +,− are both uniform sampler on all pairwise entries in the adjacency matrix, f maps to the left (f L ) and right (f R ) embedding representations [23].The loss function is the squared error between the inner product of the left and right embeddings of the two nodes and the edge weight between them in the graph: Random Walk Sampling.Random walk on graphs have been studied extensively, with many applications spanning social network analysis, information retrieval, and so on.In our framework, the random walk procedure can be seen as the node-pair sampling function.For instance, node2vec [24] with negative sampling can be reformulated in a noise contrastive fashion [25] as: where σ is the sigmoid function, k is the number of negative samples, and the positive sampling is achieved by a sliding window over a second-order biased random walk [24].We refer to the above as the random walk loss.

Graph Attention Neural Network (GAT)
Graph neural network (GNN) is a special type of neural network architecture that operates on the underlying graph structure.It does so by iteratively aggregating information from each node's neighborhood and transforming the aggregated representations [26,27].Particularly, GAT [28,29] uses an attention mechanism to weight each node's neighborhood for aggregation, and the (preactivation and pre-normalization) layer updating rule is written as follows.
where α u,v is the attention score, W ∈ R d×d is a learnable linear transformation.In practice, we use the v2 corrected attention proposed in [29]:

Our Method
We are interested in learning a collection of network embeddings, each specific to a biological context.For example, we can use heart-specific gene embeddings to unravel more tissue-specific genes related to cardiovascular diseases.Contextualizing gene embeddings to biological contexts this way allows us to unveil nuanced relationships between diseases and biological contexts, such as tissues, cell types, and other diseases or traits.The full pipeline of our approach is depicted in Figure 1.
From a high level, CONE contains two main components, including (1) a GNN decoder and (2) an MLP context encoder.The GNN decoder converts the raw, learnable, node embeddings into the final embeddings.On the other hand, the MLP context encoder projects the context-specific similarity profile that describes the relationships among different contexts (Section 4.2) into a condition embedding.When added with the raw embeddings, the condition embedding serves as a high-level contextual semantics, similar to the widely-used positional encodings in Transformer models [30].The embeddings are trained using the losses based on the random walk on the contextspecific subgraphs.We employ a straightforward approach to define a context-specific subgraph as the subgraph induced by the genes relevant to that context.Next, we formally describe our approach.

Contextualized network embeddings
Let C = {C i } i∈1,...,nc be a collection of n c contexts, where each context C i ⊂ V is a subset of nodes that defines the local context.We aim to learn a collection of embedding functions F = {f C } by where f 0 is the context-naive embedding that is optimized against the whole network G, and f C is the context-specific embedding that is optimized against the contextual random walk loss L RW C that samples random walks on the subgraph induced on the context set C, that is, G(C) = (C, {(u, v) ∈ E : u, v ∈ C}, w).Equation 6 aims to simultaneously optimize for the global and local contextualized representation of all the nodes v in the network.A naive attempt for obtaining the contextualized embeddings in equation 6 would be to learn independent f C on the corresponding contextual graph G(C).However, the resulting contextspecific embeddings may completely lose the global information of the graph, since each f C operates independently.We provide empirical evidence for this in Section B.2.

Contextualized GAT
To address the above-mentioned problem, we propose to learn a shared embedding encoding model using GAT, and contextualize different embeddings by conditioning on the raw embedding matrix.
Let g θ : R |V |×d → R |V |×d be a GAT network parameterized by θ, and Z ∈ R |V |×d the raw embedding matrix that is randomly initialized.Drawing parallels from recent work on conditional generation [31], we view context-specific embeddings as generation conditioned on a specific context, and propose to compute the contextualized embedding f C as where ϕ(C) ∈ R 1×d is the context condition embedding that defines the context C. The contextnaive embeddings are computed by passing the raw embedding alone through the GAT encoder: Finally, to form the full context-specific embedding for downstream evaluation, we concatenate it with the context-naive embedding and then project it down to d-dimension via PCA.
Context condition embedding.The context condition embedding serves to provide low level semantics about each context, and two contexts with highly overlapping sets of nodes should have similar condition embeddings.To that end, we design an approach to encode condition embedding using the context similarity matrix J ∈ R nc×nc constructed by taking the Jaccard index between all pairwise contexts, J i,j = |Ci∩Cj | |Ci∪Cj | .Finally, we use a two-layer Multi-Layer Perceptron (MLP) to project J into the condition embeddings, thus for each context C i , its corresponding condition embedding is computed as ϕ(C i ) = MLP(J) [i,:] .

Training CONE
The loss function defined in equation 6 is implemented in practice by alternating between the context-naive random walk loss L RW (f 0 ), and the context-specific random walk loss L RW (f C ) for a randomly drawn context.We train the model for 120 epochs using the AdamW [32] optimizer with a constant learning rate of 0.001 and a weight decay of 0.01.Hyperparameter selection details can be found in Appendix C.2.

Complexity analysis
As the main module of CONE, GAT has the computational complexity of O(|V |d 2 + |E|d) [28,29].In addition, the context condition embedding encoder ϕ scales linearly with respect to the number of conditions as O(n c d).However, in practice, since n c ≪ |E|, the computational complexity of CONE should be equivalent to that of a single GAT network.This effective constant scaling with respect to the number of contexts is in stark contrast with the recently proposed method PINNACLE, which scales linearly with respect to the number of contexts due to the implementation of independent GAT module for each context.We provide empirical evidence for the scalability of CONE in Appendix A.

Setup
We devise diverse biomedical tasks to evaluate the capability of CONE against baseline methods to prioritize genes in the gene interaction network.These tasks are binary classification tasks, where the goal is to identify human genes that are related to certain diseases using the gene network embeddings generated by the models.We conduct our main analysis using the PINPPI network, which is a combined network using BioGRID [33], Menche [34], and HuRI [35], provided by the PINNACLE [21] paper.For gene label information, we collect the two therapeutic target tasks (RA and IBD) from PINNACLE.Furthemore, we compile a comprehensive collection of disease-gene annotations from DisGeNET [36], following the processing steps detailed in [3].After filtering out diseases with less than ten positive genes intersecting with the PINPPI network, the final DisGeNET benchmark contains 167 diverse human diseases.
For each DisGeNET disease gene prioritization tasks, we randomly split the positive and negative genes into 6/2/2 train validation test sets.The final prediction performance are reported as the average test scores across five different random splits.For RA and IBD, we use the pre-defined train test split given by PINNACLE.Detailed dataset statistics and processing notes can be found in Appenxi C.1 Baselines.node2vec is a strong baseline method for network embedding-based gene prioritization method with superior performance on various benchmarks [37].We also include embeddings generated by a two-layer GAT (v2) network [28,29] trained in a standard graph autoencoder style [38] as a more direct baseline against CONE.Moreover, BIONIC [16] and Gemini [19] are two recent approaches that learn an integrated embedding across a collection of networks.We use them to test if embedding multiple context-specific subgraphs together gives an advantage over embedding a single context-naive network.All baselines and the CONE embeddings are evaluated in an unsupervised setting, where an ℓ 2 regularized logistic regression model is trained for each task using the embeddings that are learned without accessing any label information.
For context-specific network embeddings, we consider a recently proposed method, PINNACLE [21], which learns separate GAT modules for each context.We directly use the context-specific embeddings provided by the paper 4 to reanalyze the performance under our fair setting.We point out that PINNACLE context-specific embeddings differ slightly from CONE in that PINNACLE only generates embedding for the context-specific nodes.Conversely, CONE generates embeddings for all nodes regardless of if they are specific to the context.This enables us to evaluate all contextspecific embeddings fairly across diverse disease gene prioritization tasks.Due to this limitation of PINNACLE, we exclude it from the main disease gene prioritization benchmark (RQ1).We set the embedding dimensions to 128 for all models.
Context-specific gene sets.We primarily consider tissue-specificity for contextualizing the network embeddings.This leads to a natural understanding of the downstream disease gene prediction performance based on the association between tissues (contexts) and diseases (tasks).One widely adopted way of defining tissue-or cell type-specific genes is by differential gene expression [39], where genes that express significantly more in one context than others are considered relevant to the given context.Following this, we first obtain tissue-specific gene expression from the GTEx project [40], and then extract tissue-specific genes by taking genes with z-scores greater than one across tissues.In addition to tissue-specific genes, we also showcase CONE using other biological  Metrics.We use the log 2 fold-change of the average-precision over the prior (APOP) as the main metric for evaluating disease gene prioritization performance [3].The area under the precision recall curve, which is closely related to the average precision, is a better choice for evaluating tasks with severe class imbalance [41].The prior division, on the other hand, corrects for the expectation of the random guesser performance on different tasks with different number of positive examples.For the RA and IBD therapeutic target prediction tasks, we follow PINNACLE and report the average precision and recall at rank five (APR@5) in addition to APOP.

Results and Discussions
RQ1. Can context-specific embedding improve disease gene prediction performance?We first observe that, overall, picking the best context for each disease achieves noticeable performance improvement over the context-naive CONE embeddings, as indicated by the good performance of CONE (best) in Figure 2.Moreover, the advantage of using context-specific embedding is most apparent when the number of positive genes available for the disease is lacking, which might be attributable to the fact that diseases with more associated genes are more likely to contain more ubiquitous and less context-specific genes, and consequently reducing the effectiveness of using context-specific embeddings.In all cases, we note that either the context-naive or the context-specific CONE embedding consistently match the performance achieved by the node2vec baseline.

RQ2
. Are the top performing context biologically relevant?Despite the performance improvement due to the CONE contextualized embeddings, it is still unclear whether the biological contexts that led  to good performance on a particular disease are associated with that disease.To address this question, we manually inspect six diseases where the connection between tissue and disease manifestation appears readily evident, as shown in Table 1.We hypothesize that the CONE embeddings derived from the disease-related tissue should have a higher APOP compared to the context-naive CONE or node2vec embeddings.Indeed, many of the top contexts found by CONE are biologically meaningful, whether as one of the main affected tissues or a related tissue.For example, the top-performing contexts for subvalvular aortic stenosis and familial bicuspid aortic valve, both diseases in which there is a problem with the aortic valve, which is the valve of an artery in the heart, included artery for subvalvular aortic stenosis and heart for familial bicuspid aortic valve.More subtle top-performing but biologically-relevant contexts are small intestine and adipose for hypochromic microcytic anemia.The cause of hypochromic microcytic anemia is typically decreased iron reserves in the body.This may be due to decreased dietary iron, poor absorption of iron from the gut, acute or chronic blood loss [42].Iron absorption is primarily carried out by cells in the small intestine [43], explaining why it would be a top context for anemia.Obesity, which is an excessive accumulation of adipose, has also been molecularly linked to iron deficiency in a way that shows the conditions mutually affect one another [44].Also of note, the primary affected tissue of pure red-cell aplasia, blood, was not identified in the top three contexts.However, patients with hepatitis, a symptom of liver inflammation, sometimes develop pure red-cell aplasia [45,46].CONE did manage to highlight cryptic associations between the liver and pure red-cell aplasia.Together, these results imply that CONE can help uncover subtle disease-tissue relationships.Thus, CONE contextualized embeddings can not only achieve good prediction performance, but the top performing contexts typically show biological relevance.
RQ3. Can we further enhance therapeutic target prediction by encoding better contextual network information?Inducing biological context information has been recently shown to be beneficial for predicting therapeutic targets in complex diseases such as Rheumatoid arthritis (RA) and Inflammatory Bowel Disease (IBD) [21].Therapeutic target prediction for a particular disease is a binary classification problem that aims to predict whether a particular human gene can be used as a point of intervention for treating that disease.Compared to CONE, PINNACLE [21] takes an alternative approach by constructing a heterogeneous network that introduces different biological contexts as a type of node in the heterogeneous biomedical graph.Furthermore, the PINNACLE model learns individual sets of parameters for each biological context, contrasting with our unified model that shares the same set of parameters across all biological contexts.We hypothesize that our approach of tying weights induces more regularity from the underlying graph and, as a result, produces better contextualized embeddings for predicting therapeutic targets.
To test this, we first use the cell-type specific genes processed by PINNACLE to generate celltype specific CONE embeddings.We then follow the original evaluation and measure different contextualized embeddings' performance using APR@5.We note that the PINNACLE contextspecific embeddings only contain embeddings of genes within the context.Conversely, CONE context-specific embeddings are genome-wide, meaning that embeddings in any context contain the same number of genes, spanning the whole network.Thus, to unify the setting between PINNACLE and CONE, we subset each context-specific CONE embedding to the corresponding context genes.
As shown in Figure 3, our CONE embeddings achieve significantly better performance than the PINNACLE embeddings when used in an unsupervised embedding setting, in which the training of the embedding does not involve node label information for the downstream prediction tasks.Furthermore, the highlighted immune cell contexts show that CONE better prioritizes the relevant cell context of IBD and RA, both of which are autoimmune diseases resulting from the malfunction of immune cells.Upon top-performing cell contexts in RA, CONE achieves better performance than PINNACLE.Specifically, CONE reveals contexts that are biologically related to RA.For example, pancreatic acinar cells (rank of APOP=2, rank of APR@5=1) secrete digestive enzymes that are involved in the digestion process within the small intestine.The early activation of these digestive enzymes before they reach the duodenum can trigger the onset of acute pancreatitis [47].On the other hand, acute pancreatitis is highly associated with RA.Clinical studies have shown that RA patients were 2.51 times more likely to develop acute pancreatitis [48].CONE is also able to reveal hidden associations based on cell-type-specific networks.Similarly, CONE performs better than PINNACLE in identifying top relevant cell contexts related to IBD.CONE picked up duct epithelial cells (rank of APOP=1, rank of APR@5=1) as the top cell context.These cells are integral to the intestinal barrier, serving as the first line of defense against invading microorganisms [49].However, in IBD patients, the proper function of the intestinal barrier is frequently compromised to varying degrees [50].
Overall, these examples demonstrate the superior power of CONE in predicting therapeutic targets using biologically relevant cell contexts.
RQ4. Can CONE leverage biological contexts other than tissue or cell-type?Since CONE takes contextual information in the form of node sets, it can be extended to biological contexts outside of traditionally used tissue and cell type contexts [20,21].Here, we explore the extendability of our CONE approach to different biological contexts in two other ways.First, we reevaluate the RA and IBD prediction performance using CONE trained on different disease contexts defined by differentially expressed genes obtained from CREEDS [51].We find that the top-ranked contexts for both RA and IBD are indeed highly relevant disorders for both diseases (Figure 4).For example, psoriasis is one of the top disease contexts related to RA (rank of APOP=3).A clear connection between these two conditions is psoriatic arthritis, a form of arthritis with a skin rash, which is a common symptom in psoriasis [52].This indicates that there are similar genetic programs shared by both diseases [53], which is revealed by CONE using disease context networks.Furthermore, CONE also reveals several connections between RA and some seemingly unrelated diseases, such as heart failure (rank of APOP=115).Notably, a recent study has confirmed that RA patients have a two-fold higher risk of heart failure mortality than those without RA [54].
Similarly, CONE unveils meaningful relationships between IBD and other disease contexts.Cystitis (rank of APOP=3) is one of the top disease contexts identified by CONE.Clinical studies have shown that cystitis, an inflammation of the bladder, leads to a significant increase in the risk of IBD [55][56][57].
For example, individuals with interstitial cystitis, a condition involving an inflamed or irritated bladder wall, are 100 times more likely to have IBD [57].CONE also finds subtle relationships between IBD and neurological complications exemplified by epilepsy syndrome (rank of APOP=114) and autism spectrum disorder (rank of APOP=112) in the top list [58].Neurological complications affect 0.25% to 47.50% IBD patients [59].These IBD-related neurological complications are associated with neuroinflammation or increased risk of blood clots in brain veins [60].Some diseases may have a protective effect on other diseases.CONE identified such protective relationships between Helicobacter pylori gastrointestinal tract infection (rank of APOP=2) and IBD, since H. pylori infection helps protect against IBD by inducing systematic immune tolerance and suppressing inflammatory responses [61].Overall, these examples further confirm that CONE can leverage disease context to reveal both apparent and cryptic associations between complex diseases.
Finally, we reperform the DisGeNET benchmark using a diverse construction of context specific gene sets, spanning from tissues, cell types, to diseases, and retrieved from various databases, including CellMarker2.0[62], Human Protein Atlas [63], and the TISSUES database [64].We observe that CONE performs similarly under all collections of contexts tested (Figure B.3). Together with the fact that CONE performs competitively against baselines (RQ1), and that it captures biologically meaningful contextual information (RQ2), we believe that CONE is a versatile and effective approach to scalably generate biological network embeddings conditioned on specific biological contexts.

RQ5. Cane CONE transfer to unseen contexts?
The MLP context encoder used by CONE provides possibility to generate embeddings for contexts that are not observed during training.This can be done by feeding the similarity profile of the new query context against all training contexts to the MLP encoder.To demonstrate the effectiveness of transfering to unseen context, we retrain the CONE GTEx tissue-specific embeddings, but leaving out the Heart context during training.We observe that holding out Heart context does not affect the disease gene classification performance significantly, with > 0.8 coefficient of correlation against the original performance.Furthermore, we compile two lists of diseases for which the Heart context achieved top five performances, for the original and held-out Heart versions of CONE.We found a significant overlap between the two lists (Hypergeometric p-val < 0.05), with seven common diseases (Table B.2).One notable example is familial bicuspid aortic valve, a known common congenital heart defect [65].These results highlight the effective transferability of CONE to unseen contexts, with similar embedding quality as if the contexts were seen during training.

Conclusion
We proposed CONE, a flexible and scalable framework that can inject arbitrary contextual information into gene interaction networks.Our study underscores the efficacy of the CONE method in enhancing the prioritization of genes within the gene interaction network.CONE consistently demonstrated superior performance in various biomedical tasks than baseline methods.Crucially, the introduction of context-specific embeddings, especially when positive genes for a disease were limited, resulted in significant performance gains.Moreover, the contexts identified by CONE were found to be biologically relevant, suggesting that the method not only boosts prediction accuracy but also provides biologically meaningful insights.This ability to integrate diverse biological contexts, from tissues and cell types to diseases, positions CONE as a versatile tool that can be used to uncover both explicit and cryptic relationships within biomedical datasets.
Limitations and future directions.Constructing context-specific subnetworks solely based on the subgraph induced by context-specific genes is a reasonable but overly simplistic assumption.In reality, context-specific gene interactions are complicated and encompass diverse mechanisms, ranging from interactions mediated by non-coding RNAs to the influence of epigenetic modifications and signaling pathways.Consequently, a promising advancement would involve the carefully constructed contextspecific gene interaction networks that account for these intricate nuances, such as HumanBase [66], HIPPIE [67], IID [68], and many others [69][70][71].

A Scalability experiment
We empirically demonstrate the scalability of CONE against three other related methods, including GAT, BIONIC, and PINNACLE.All these methods use the GAT module as the main encoding component.CONE and GAT both employ a single GAT module, while BIONIC and PINNACLE use individual GAT modules for different contexts.
Setup We consider two types of scaling experiments, number of contexts and context node percentages.
For number of contexts, we fix the number of context-specific nodes to be about 5% of the total number of nodes and vary the number of contexts from 10 to 1000.Meanwhile, for context node percentages, we fix the number of contexts to be 100, and vary the context node percentages from 1% to 50%.For both sets of experiments, we use a synthetic network with 10, 000 nodes generated using the Barabási-Albert model [72].The network is generated so that the density is approximately 0.01.We report the peak CUDA memory usage (in bytes) of the forward pass for the model using the torch.cuda.max_memory_allocated()function.
Implementation details All experiments are conducted on compute nodes with 5 CPUs, 45GB memory, and a Tesla v100 GPU (32GB).We uniformly set the following hyperparameters across models: 128 dimensions and one layer.Furthermore, BIONIC and PINNACLE require subgraph batched training.We set the batch size to 2048 for both models.On the other hand, CONE and GAT employ full batched computation.

Results
The empirical scalability results are shown in Figure A.1.We first highlight that CONE shows great scalability, with very minimal overhead as more contexts are introduced.Remarkably, CONE's memory consumption is quite comparable to the plain GAT model that does not take context into account.This aligns well with our complexity analysis (Section 4.4).
Conversely, BIONIC and PINNACLE react drastically to the number of contexts, with BIONIC running out of memory beyond 500 contexts.Furthermore, PINNACLE demonstrates a severe scalability issue with the context nodes percentage.In other words, PINNACLE memory consumption drastically increases as the context subgraphs increase.These results showcase the scalability advantage of CONE using a single shared GAT module to decode embeddings for all contexts.

B.1 Effects of PCA dimensionality reduction
The final CONE context-specific embeddings are obtained by first concatenating the context-naive with the context-specific embeddings, and then applying PCA to project the dimension by half.
Combining the context-naive and context-specific embeddings gives the final embedding a more comprehensive view of both the global and local (context-specific) semantics.Dimensionality reduction is applied so that the results for the final context-specific embeddings can be fairly compared to the context-naive embeddings.PCA is a common dimensionality reduction technique due to its simplicity and has been used in previous studies to combine multiple views of the embeddings, such as Walklets [73].
One question remaining is how does the performance change before and after applying PCA.Here, we compare the performance between the fully concatenated and PCA-reduced versions of CONE following the DisGeNET benchmarking setting in RQ1.We observe little performance difference between the two versions of CONE (Figure B.1).Thus, we set the final CONE to use PCA, as it provides a fairer setting in terms of the number of dimensions while having the same performance.

B.2 Ablation studies
In the followings, we investigate the effectiveness of our main design choices of CONE, including the context similarity measure and the MLP context encoder.We follow the same experimental settings in RQ1 using the DisGeNET gene classification benchmark.
Context similarity measures.Besides the default Jaccard similarity measure, we consider three other similarity measures, including the cosine similarity, radial basis function, and Spearman correlation coefficient.Table B.1 shows that the choice of similarity has marginal effects on the performance, with the default Jaccard similarity consistently achieving better or equivalent performance against other choices of similarities.Figure B.2 further indicate that there is no significant performance differences across the choice of similarity measures according to the paired Wilcoxon test.
Context encoder.A trivial way to encode context embedding is one-hot encoding, which is equivalent to directly learning the embedding for each contexts.We call this approach Embedding.We observe that Embedding achieves the lowest average performance across all groups of tasks (Table B.1).In the case of disease task group [23,42), the Embedding performance is significantly worse than that of the default CONE (

Figure 1 :
Figure 1: Overview of CONE embedding collection training and inference.

Figure 2 :
Figure 2: DisGeNET disease gene predictions performance comparison between node2vec and CONE embeddings.Each point in the box plot corresponds to the prediction test performance of a disease averaged across five random splits.Different panels show groups of diseases with different number of positive genes.For example, the left-most panel contains 31 diseases with at least 10 but less than 13 positive genes.ns, *, and ** indicate the significance level of the paired Wilcoxon test between the baseline node2vec and CONE (ns: not significant, *: 0.01 < p-value ≤ 0.05, **: p-value < 0.01).

Figure 3 :
Figure3: Therapeutic area predictions for RA and IBD.Each point represent the APR@5 score achieved when using a particular cell-type-specific embedding to predict the RA or IBD therapeutic targets.Immune cell contexts are highlighted in orange and pink for CONE and PINNACLE.

Figure 4 :
Figure 4: Sorted disease contexts for predict the therapeutic area predictions for RA and IBD.

Figure A. 1 :
Figure A.1: Models scalability across different contextualization settings.Star indicates the point beyond which the model will run out of memory.

Figure B. 1 :
Figure B.1: CONE dimensionality reduction effect comparison.Performance comparison between PCA-reduced CONE (x-axis) and not reduced CONE (y-axis) in terms of testing APOP on the DisGeNET disease gene classification benchmark.

Table 1 :
Each point in the box plot corresponds to the prediction test performance of a disease averaged across five random splits.Different panels show groups of diseases with different number of positive genes.For example, the left-most panel contains 31 diseases with at least 10 but less than 13 positive genes.ns, *, and ** indicate the significance level of the paired Wilcoxon test between the baseline node2vec and CONE (ns: not significant, Top performing contexts for selected diseases in the DisGeNET benchmark.Performance reported as test APOP scores averaged across five random splits.The top contexts are sorted descendingly from left two right.For example, the Heart context achieved the highest score for Nemaline myopathy. *: 0.01 < p-value ≤ 0.05, **: p-value < 0.01).contexts, including cell types and diseases.The specific constructions for those context-specific gene sets and their statistics can be found in Appendix C.1.

Table 2 :
Top four performing contexts for predicting RA and IBD therapeutic targets.The first block of rows show the top four cell types with highest APOP socres of predicting RA and IBD using PINNACLE embeddings.Similarly, the second block of rows show those for the CONE embeddings.

Table B . 1 :
[75]tion study of context similarity and context encoding strategies using the DisGeNET benchmark.Results are reported as APOP scores averaged across tasks within a group based on the number of positive examples.Table B.2:List of diseases for which the Heart context appears to be the top five performing contexts in terms of test APOP scores.Last two columns indicate whether the disease shows up as top five context for the original CONE or the one trained without Heart context.We obtain the raw PINPPI network from the PINNACLE paper5, which contains 15, 461 nodes and 207, 641 edges.We then convert the node IDs from gene symbol to Entrez ID[74]using the MyGeneInfo query service[75].We only perserve genes that have exact one-to-one mapping from gene symbol to Entrez ID.After the above conversion, the final processed Entrez based PINPPI network contains 15, 229 nodes and 206, 835 edges.Table C.1: Gene set statistics.First three gene set collections are used as prediction tasks, and the remaining three gene set collections are used as contexts.