Contextualized Networks Reveal Heterogeneous Transcriptomic Regulation in Tumors at Sample-Specific Resolution

Cancers are shaped by somatic mutations, microenvironment, and patient background, each altering gene expression and regulation in complex ways, resulting in heterogeneous cellular states and dynamics. Inferring gene regulatory network (GRN) models from expression data can help characterize this regulation-driven heterogeneity, but network inference requires many statistical samples, traditionally limiting GRNs to cluster-level analyses that ignore intra-cluster heterogeneity. We propose to move beyond cluster-based analyses by using contextualized learning, a multi-task learning paradigm which allows us to infer sample-specific models using phenotypic, molecular, and environmental information pertinent to the model, encoded as the model’s “context” to be conditioned on. We unify three network model classes (Correlation, Markov, Neighborhood) and estimate context-specific GRNs for 7997 tumors across 25 tumor types, with each network contextualized by copy number and driver mutation profiles, tumor microenvironment, and patient demographics. Contextualized GRNs provide a structured view of expression dynamics at sample-specific resolution, which reveal co-expression modules in correlation networks (CNs), as well as cliques and independent regulatory elements in Markov Networks (MNs) and Neighborhood Regression Networks (NNs). Our generative modeling approach allows us to predict GRNs for unseen tumor types based on a pan-cancer model of how somatic mutations affect gene regulation. Finally, contextualized networks enable GRN-based precision oncology, explaining known biomarkers in terms of network-mediated effects, and leading to novel subtypings for thyroid, brain, and gastrointestinal tumors that improve survival prognosis.


Introduction
Tumors are heterogeneous, developing through clonal evolution that accumulates mutations, including cancer-driving single-nucleotide variants (SNVs) and somatic copy number alterations (SCNAs).In addition to tumor cell intrinsic changes, tumors develop in and are shaped by a microenvironment that includes immune cells, the extracellular matrix, blood vessels and surrounding cells.This extensive heterogeneity necessitates hetereogeneous treatments targeted to individual patients.However, estimating treatment effects and patient prognosis at patient-specific resolution implies an n-of-1 approach to treatment that is technically and temporally infeasible.Instead, methods have historically sought to identify prognostic biomarkers that stratify patients into tumor subtype cohorts, and predictive biomarkers that identify patients who generally respond to treatment.The Cancer Genome Atlas1 (TCGA) derives prognostic subtypes via cluster analysis on clinical and molecular data, including cancer-driving SNVs, SCNAs, DNA methylation, mitochondrial DNA, RNA-seq, miRNA, protein abundance arrays, histology images, patient demographics, and/or immunological data, and further identifies prognostic biomarkers as features that differentiate these clusters .While clusters can be analyzed in terms of feature stratification, clustering ignores the latent feature interactions and hierarchical feature relationships that define biological systems.Biomarkers identified by cluster analysis have no mechanistic interpretation, and require further experimentation to validate their role in tumorigenesis and tumor pathology.Consequently, the identification of biomarkers using somatic DNA alterations or gene expression patterns has proved challenging [24], but biological dogma and notable exceptions (e.g.HER2 amplification in breast cancer) motivate us to find a systematic way to search for differentiating regulatory factors that reflect cellular states and foreshadow cellular responses to treatments.In our view, biomarker discovery should directly inform the development of novel treatments, revealing molecular features that relate to the robustness or fragility of molecular systems in individual tumors.Addressing the shortcomings of cluster analysis, we focus on three questions: (1) how do we model the mechanisms of molecular interactions as it relates to tumorigenesis and treatment efficacy, (2) how do we identify prognostic biomarkers for rare diseases and outlier patients that are too sparsely sampled to cluster, and (3) how can we quantify the heterogeneity of tumor pathology, which is widely acknowledged but poorly understood, and encode or represent the myriad of phenotypic, molecular, and environmental factors driving this heterogeneity from observational data alone?Toward representing interactions, gene regulatory networks (GRNs) represent the functional circuitry within cells that simultaneously respond to biomolecular stimulus and drive tumorigenesis.We intuit that many interactions between disparate biomolecular features can be identified at the cellular level through transcriptomic regulation, both directly and indirectly.Further, tumor-specific GRNs capture regulatory redundancy and fragility in individual cancers, whereby multi-omic features relate to GRN structure and organization, and GRN organization reveals the functional mechanisms of tumor pathology and the robustness of therapeutic targets.Single-cell and multi-omic profiling have advanced the potential for studying highly context-specific regulatory relationships in GRNs, but computational methods of inferring GRNs continue to rely on partitioning samples into homogeneous sets of samples [25][26][27][28].As such, existing methods for high-resolution network inference either impose strong biological priors based on known transcription factor-gene regulation [29], or apply a sample-left-out approach that lacks statistical power [30,31].
Partition-based modeling is insufficient to capture high-resolution or continuously rewiring GRNs, a problem for precision oncology because some types of cancer neither form discrete clusters [32] nor cluster by tissue of origin [33].
More generally, the exponential increase of data set complexity, heterogeneity, and size, has motivated the need for sample-specific inference in many application areas [34][35][36][37][38]. Contextualized modeling [39] addresses this by representing the heterogeneity in data as driven by sample-specific models, and explaining variation among sample-specific models in terms of sample context encodings.These contexts can be any information that may explain heterogeneity in the data (e.g.age, genotype, medical images, environmental factors).More traditionally, context-driven heterogeneity might be controlled by performing probabilistic inference on context-specific data sets, but this fails to scale to high-dimensional and continuously-varying contexts, common in biomedical data, where context data splits have as few as one sample and most context conditions are missing entirely.The simplest and most classic version of the contextualization paradigm are varying-coefficient models [40], which account for the effects of a univariate and continuously-varying context on a linear model's parameters.
Modern contextualized models, proposed in [41] and generalized in [39], are a combination of statistical modeling and deep learning, where context encoders are typically neural networks that can utilize any multi-modal contextual information.This framework also introduces the concept of model archetypes (Figure 1d), whereby all sample-specific models are spanned by the set of model archetypes, constraining and explaining their variation through the context encoding which parameterizes this space (See Methods).Thus, these archetypes, also learned from data, link the heterogeneity of sample-specific models to variation in the context encoding and enable sharing information between sample-specific model inference tasks.Many notable works on heterogeneous linear effects use this framework [41][42][43][44][45][46], but contextualized models have yet to be extended to the more general graphical modeling regime.
To infer tumor-specific GRNs that account for patient-to-patient heterogeneity, we propose to reframe GRN inference within the contextualized modeling paradigm, thereby sharing information among tumor-specific inference tasks by relating these tasks through their clinical and molecular contexts.By recasting networks as the output of a learnable function, our approach shares statistical power between samples while also permitting fine-grained variation to capture the complexity of sample-specific contexts such as tissue-of-origin, somatic mutation landscape, tumor microenvironment and clinical measurements.We formulate three types of GRNs (Markov, Neighborhood, and Correlation networks) under this paradigm, and estimate sample-specific GRNs which enable sample-specific analyses of latent regulatory processes.By applying this computational framework to over 7000 samples, we find that contextualized networks improve prediction of held-out expression data and reveal latent heterogeneity which has previously been obscured by partition-based methods of network inference.models achieve this improved predictive performance by accounting for contextual dependencies in model parameters without imposing prior assumptions on the form of these dependencies.As a result, contextualized graphical models capture context-specific effects that can be overlooked by group-level modeling approaches (e.g.cluster-specific, disease-specific models).

Contextualized Networks Share Power Between All Cancer Types and Infer Models for Unseen Diseases
Contextualization relates transcriptional regulation to genomic variation through a context encoder (Fig. 1).During training, the encoder learns to modify the parameters of a downstream network model in response to contextual signals.
At test time, the encoder uses learned context signals to generalize between sparsely sampled contexts.Rare or undersampled diseases like Kidney Chromophobe (KICH) and Glioblastoma multiforme (GBM) can benefit from contextual signals learned from well-sampled diseases in similar tissues (Figure 2b).In disease-specific modeling, these smaller subpopulations must either be lumped within a larger tissue group, ignoring subpopulation heterogeneity, or modeled individually, sacrificing statistical power in a "large p small n" regime.For example, there are n = 75 training samples from KICH patients, while each disease-specific network has 50 × 50 edges, or p = 2500 parameters; estimating a disease-specific network from such limited data would be prohibitively high-variance.
Furthermore, contextualization adapts models to unseen contexts at test time, responding to even extreme distribution shift (Fig. 2a).For completely unseen contexts, the context encoder can still leverage learned relationships between contexts and models to infer zero-shot network models on-demand.We evaluate model performance through a disease-fold cross vaidation, where we hold out each of the 25 disease types in turn and learn to contextualize networks on the remaining 24.Notably, disease-specific modeling cannot be applied in this regime.In contrast, contextualized networks improve model performance and reduce error on 22 or 25 hold-out diseases, even when generalizing to an entirely new disease type.

Contextualized Networks Reveal Tissue-Specific Regulatory Modules
Contextualization produces context-specific network models, resulting in patient-specific networks for all 7997 patients in our TCGA dataset.Organizing patients according to their network models reveals that tissue type is a primary driver, but not the sole factor in determining gene-gene interactions (Fig. 3).In particular, diseased networks differ drastically from healthy networks, while gene and PCA-derived metagene expression profiles are still largely tissue-derived.Additionally, intra-disease (Fig. 5a) and inter-disease (Fig. 6a) subtypes are visible even at pan-cancer resolution (Fig. 3), making obvious common tumorigenesis mechanisms that underly noisy gene expression dynamics.

Contextualized Networks Reveal Regulatory Modules Conserved Across Tissues in Cancer
Contextualized networks reveal that tumors of the GI tract display a continuum of GRN dysregulation (Figure 6).
While this continuum cannot be captured by existing TCGA subtypes [47], contextualized networks form clusters that relate existing subtypes to inter-disease and intra-disease heterogeneity via conserved regulatory motifs and shared dysregulation motifs.Contextualized networks reveal that tumors of the GI tract display a continuum of GRN dysregulation (Figure 6).While this continuum cannot be captured by existing TCGA subtypes, contextualized networks form clusters that relate existing subtypes to inter-disease and intra-disease heterogeneity via conserved regulatory motifs and shared dysregulation motifs.Finally, contextualized networks discover disparate types of GRN dysregulation within patients assigned to the SCNA-derived GI.CIN subtype, comprising the majority of GI tract tumors (Fig. 6a).Re-assigning patients based on GRN-derived subtypes improves prognosis (Fig. 6b) and reveals biomarkers of these dysregulation subtypes (Fig. 6a) including SNV-SCNA interactions such as HRAS mutations with chromosome 18 arm p loss of heterozygosity.For each of the 25 tumor types, we cluster patients by their contextualized networks to identify network-based tumor subtypes by flattening the network parameters and applying hierarchical ward clustering.To compare the prognostic utility of network-based subtypes against the prognostic utility of state-of-the-art TCGA subtypes and expression subtypes, we use the same number of clusters for each disease as subtypes annotated in TCGA.We find that networkbased subtypes are more prognostic on average than both expression-derived subtypes and TCGA subtypes (Table 2).

Contextualized Networks Discover Novel Prognostic Subtypes
In general, we find that network-based subtypes either recapitulate or refine prognostic subtypes produced by TCGA, which often utilize additional data types including DNA methylation, miRNA, and histopathological imaging.All subtype comparisons by disease are available in Appendix S4.For 10 of 25 tumor types, contextualized networks reduce one of the survival function p-values by at least an order of magnitude, and in some cases, as much as 9 orders of magnitude on KIRC, 4 orders of magnitude on LGG, 2.4 orders of magnitude on THCA, and 2.3 orders of magnitude on HNSC.On KICH, both network subtypes and TCGA subtyapes are outperformed by expression subtypes by 13.5 orders of magnitude.In the second and third worst cases for contextualized networks, network subtypes are outperformed by TCGA subtypes on GBM and UVM in terms of survival prognosis by about 1.5 and 1.3 orders of magnitude, respectively.

Discussion
In this study, we propose contextualized GRNs as cohesive sample-specific representations of latent tumor states underlying disease progression and survival.Our models reveal new insights about cancer heterogeneity by relating transcriptomic, genetic, immune, and clinical factors to through tumor regulatory network topology.
The importance of context in cancer development and treatment is well recognised with treatment decisions frequently determined by a tumor's tissue of origin.The frequency of mutations in specific driver mutations varies substantially between tumors of different tissues and likely reflects the importance of distinct signaling pathways within distinct cellular contexts [48].For example, BRAF(V600E) driver mutations vary substantially in frequency across cancer types and drugs that target the BRAF(V600E) mutant product are less effective in colorectal cancers than in skin cutaneous melanoma and non-small cell lung cancers with this mutation [49].Further emphasizing the importance of context beyond the tissue-level, considerable variation in terms of aggressiveness, drug sensitivity, and genomic mutations, is also observed between tumors arising from the same cell type and tissue [50].These heterogeneous genetic contexts likely hinder efforts to define tumor subgroups based on specific mutations with epistasis, which involves the action of one gene on another, having been shown to affect treatment efficacy in acute myeloid leukemia where NPM1 mutations confer a favorable prognosis only in the presence of a co-occurring IDH1 or IDH2 mutation [51].
Although genetic heterogeneity between tumors from the same tumor type is known to be widespread, it has long been thought that heterogeneity at the phenotype level may not be so marked, with the same cellular pathways often affected [52].For example, dysregulation of the G1-S transition is observed in almost all cancers, and may occur through multiple routes, both promoting proliferation and overriding cellular senescence [53].However, in spite of the evidence for functional convergence, it is challenging with current statistical methods to identify biomarkers that define similar phenotypes on genetically diverse contexts in order to guide treatment.
Many promising expression-based biomarkers use the level of expression across gene pathways or multiple genes rather than identifying specific somatic mutations [24].Contextualized GRNs provide an intuitive way of identifying both subpopulations with differential transcriptomic regulation and the pathway-level cohorts of genes that should be studied as potential biomarkers, as well as the likely effect size of pathway dysregulation.Contextualized GRNs further identify associated contextual signals with these subpopulations, providing new leads for traditional classes of genomic biomarkers.
More broadly, contextualized modeling seeks to estimate context-specific models beyond context-specific sampling constraints.By sharing information among samples while also allowing sample-specific variation, our framework models complex and dynamic distributions despite physical and technical barriers that would typically prohibit sample-specific inference.Context-dependent models models naturally account for non-identically distributed data and provide a principled method for performing statistical inference on data that would traditionally be too small or too heterogeneous.While it is generally believed that biological observations are a product of latent cellular states and tumors exhibit extreme patient-to-patient heterogeneity, these ideas are orthogonal in traditional modeling regimes.
Contextualized GRNs are the first to effectively unite the two: networks are a useful latent representation, relating biomarkers to pathology through systems of molecular interactions, and accounting for network heterogeneity allows us to explore both population-level and per-patient tumor pathology in terms of latent representations of molecular systems.

Contextualized Networks
We seek a context-specific density of network parameters P(θ | C) such that is maximized, where P N (X | θ) is the probability of gene expression X ∈ R p under network model class M with parameters θ ∈ R p×p and context C, which can contain both multivariate and real features.To overcome θ being a high-dimensional, structured latent variable, we assume that all contextualized networks lie on a subspace spanned by a set of K network archetypes A ∶= span({A k ∈ R p×p ∶ A 1 , ..., A K }), i.e. θ ∈ A. Further, the space spanned by A is parameterized by a latent variable ("subtype") and the context-specific network model θ (and subsequently the gene expression observations X) are independent of context given Z, i.e.C ⊥ (X, θ) | Z.In this way, we constrain θ as a convex combination of network archetypes via latent mixing.
Where the context encoder ϕ(C; f, A) is parameterized by a learnable context-to-subtype mapping f and the set of archetypes A. This architecture is shown in Figure 1d, and is learned end-to-end with backpropagation.While the archetypal networks A k could use prior knowledge for initialization or regularization, no prior knowledge is required.
In all experiments reported here, we do not use any prior knowledge of network structure or parameters.
This framework unites three different perspectives of GRNs: (1) Correlation networks, in which network edges are the pairwise Pearson's correlation between nodes, (2) Markov networks, in which edges are the pairwise precision values representing conditional dependencies between nodes, and (3) Neighborhood regression networks, in which edges represent directed linear relationships between nodes.The key challenge for each network class is to define a differentiable loss function ℓ M that is proportional to the negative log probability of our contextualized network model.
The loss objective can be used in the end-to-end optimization, solving for the context encoder and the network archetypes simultaneously, and subsequently inferring the context-specific parameters θ.Below, we outline a unifying linear parameterization of each network loss.Implementation details are discussed in Appendix S1.

Contextualized Neighborhood Regression
We first apply contextualization to the graph variable selection algorithm proposed by Meinhausen and Buhlmann [54].The direct relationship of this model to lasso regression links contextualized neighborhood regression to original works on contextualized linear models [41], making it a convenient stepping stone toward the graphical models in the sequel.The model is a Gaussian graphical model where X ∼ N (0, Σ) and Σ has sparse off-diagonal entries.The algorithm, neighborhood regression, recovers edges between nodes with non-zero partial correlations by solving the lasso regression for every feature X i , given every other feature X −i , where regression maximizes P (X i |X −i ) via the resulting in edges with source X j for every j ≠ i and sink X i and strength θ ij , or no edge if θ ij = 0. Equivalently, we parameterize the neighborhood selection objective using the square matrix of network edge parameters θ ∈ R p×p .
Where the contextualized neighborhood network objective replaces θ for each sample with a context-specific θ n = ϕ(C n ; f, A).Finally, we define a function ϕ ′ to mask the diagonal of θ, presenting the loss function ℓ N N for contextualized neighborhood regression networks where ⊗ is the hadamard product.

Contextualized Markov Networks
Linear regression and Gaussian graphical models are intrinsically related, allowing us to extend work on contextualized linear models to various graphical representations of the Gaussian graphical model.To estimate sample-specific precision matrices representing the conditional dependency structure of an undirected graphical model or Markov network, we assume the data is drawn from X ∼ N (0, Ω −1 ) where Ω = Σ −1 and estimate pairwise partial correlation coefficients.Using an equivalence defined by Peng et al. [55], the partial correlation coefficient is defined by regression as Where the precision matrix Ω has elements ω ij and β is the ordinary least squares solution to multivariate linear Critically, the precision matrix directly encodes conditional independence between features in X, and thus precision encodes the Markov network.
Following [56], we assume constant diagonal precision w ii = w jj ∀i, j and therefore achieve proportionality between the regression and the precision matrix.
Assuming unit diagonal precisions ω ii = 1, the proportionality becomes exact equivalence.Further, proportionality induces symmetry in the regression, i.e. β ij = β ji .We encode this in the objective by requiring our estimate for θ to be a symmetrically augmented matrix based on γ, i.e.
If Ω is sparse, we can apply lasso regularization to the multivariate regression objective [54].Given the similarity between this differential Markov network objective and the neighborhood regression objective, we follow the exact contextualization procedure from above to contextualize γ and arrive at a loss function where ϕ ′ is defined identically for masking the diagonal.The resulting precision matrix estimate is Ω = −(ϕ ′ (C; f , Â)+ ϕ ′ (C; f , Â) T ).In practice we do not threshold the estimated precision to non-zero values, instead using the exact precision matrix to represent the Markov network, retaining information about dependency strength as well as dependency structure in the network.

Contextualized Correlation Networks
Correlation networks are simple to estimate and often state-of-the-art for gene regulatory network inference [27]; contextualized correlation expand this utility to the granularity of sample-specific network inferences.To estimate sample-specific correlation networks, we assume the data was drawn from X ∼ N (0, Σ) and use the well known univariable regression view of Pearson's marginal correlation coefficient: where the covariance matrix Σ has elements σ ij , and This form converts correlation into two separable univariate least-squares regressions that maximize the marginal conditional probabilities P (X i |X j ) and Contextualizing this differentiable objective, we get the contextualized correlation network loss where the context-specific correlation matrix is reconstructed as

Baselines
We compare contextualized modeling with several traditional approaches for context-controlled and context-agnostic inference, including population modeling, cluster modeling, and cohort modeling (Fig. 7).A population model assumes that the entire sample population is identically distributed.As a result, population modeling infers a single model representing all observations.In reality, sample populations often contains two or more uniquely distributed subpopulations.If we expect that there are several subpopulations with many observations each, and that these subpopulations can be stratified by context, it may be appropriate to cluster the data by context to identify these subpopulations and then infer a model for each context-clustered subpopulation.This assumes that all context features are equally important and therefore does not tolerate noise features well.Alternatively, when subpopulation groupings are known to be determined by a few important features, cohort modeling is more appropriate.Sample cohorts can be identified based on prior knowledge about important context features (e.g.disease type).The baseline modeling regimes enjoy the benefits of traditional inference methods (i.e.identifiability, convergence) by relying on the assumption there are a discrete number of subpopulations underlying the observed data that are each defined by a latent model, and each of these subpopulations is well-sampled.This assumption is rarely, if ever, satisfied in a real-world setting.We develop contextualized modeling as a synthesis between traditional statistical inference and modern deep learning to enable model-based analysis of heterogeneous real data.Contextualized modeling assumes a functional dependency between models, but unlike prior methods makes no assumption about the form or complexity of this dependency.As such, contextualized models permit context-informed inference even when contexts are sparsely sampled and high dimensional.

Data
Our dataset is constructed from The Cancer Genome Atlas2 (TCGA) and related studies, covering 7997 samples from 7648 patients with 6397 samples for training and validation and 1600 as testing.For context, we use clinical information, biopsy composition, SCNAs and cancer-driving SNVs (Appendix S2).Gene expression data was logtransformed and compressed to a set of cancer driver genes, then transformed using PCA into 50 metagenes.Networks were learned on the metagene expression data.

Supplemental Information for:
Contextualized Networks Reveal Heterogeneous Transcriptomic Regulation in Tumors at Sample-Specific Resolution

S1 Implementation
Our entire framework (Fig. 1) is implemented in PyTorch using the PyTorch Lightning framework within our opensource software ContextualizedML.The context encoder, network archetypes, and contextualized network models are learned simultaneously using end-to-end backpropagation of the network loss (defined in Methods).

Context Encoder & Training
The context encoder is implemented as a multi-layer perceptron with 3 hidden layers, each 100 neurons wide with ReLU activations.The context data views (S2.We use early-stopping with a patience of 5 to end training when the minimum validation loss has not been improved for 5 epochs.We retain only the model with the minimum validation loss for each bootstrap.In Table 1, we evaluate these bootstraps individually to get error means and variances.Following this, we apply each of our bootstrapped models to the non-bootstrapped training-validation set and average the outputs of each model to obtain a single graph for each patient in this set, which we evaluate in-depth in Figures 4, 5, and 6, and for all disease types in Appendix S4.
The context encoder is a highly flexible component of our framework and a driving force for future work.This attribute can be used to enforce assumptions about the relationships between contexts and models, between context features, and about the archetype space.For instance, by using a neural additive model instead of a multi-layer perceptron, we provide context-feature-specific archetype weights for interpretability.Similarly, we can augment our context encoder with a convolutional base and include imaging modalities in our context views.At the context encoder head, we currently use an unconstrained output, but applying a softmax activation would require all of the sample-specific models to lie within a polytope defined by the archetypal networks.

S2.1 Data sources
The Cancer Genome Atlas3 (TCGA) is a publicly-available pan-cancer datasource containing genomic, transcriptomic, and clinical profiling of tumors from dozens of landmark studies.We queried TCGA for samples with bulk RNAsequencing and merged this dataset with two follow-up studies on an overlapping set of patients.
Somatic copy number alterations (SCNAs) SCNAs affect a larger fraction of the genome than do any other type of somatic genetic alteration [57] and are a major driver of expression variation in cancer [58].We used copy number profiles derived from TCGA samples using ASCAT [59] from a pan-cancer study of the role of allele-specific SCNAs in cancer [60].
Driver single-nucleotide mutations (SNVs) SNVs can be classified into "driver" mutations thought to provide selective growth advantage and "passenger" mutations thought to have little role in promoting cancer development.

Context data views
Clinical information This data view incorporates sample tissue-of-origin, race, age at diagnosis, gender, year of birth, and days to collection provided by TCGA.
Biopsy Composition This data view contains the sample's percent tumor cells, percent normal cells, percent tumor nuclei, percent monocyte infiltration, percent lymphocyte infiltration, and percent neutrophil infiltration provided by TCGA.We also incorporate expression-derived estimates of the fraction of a sample consisting of tumor cells from [60].
Copy Number Alterations From ASCAT [59], we gather whole genome doubling events as well as gain and loss events for bp-specific regions of hg19 based on data from [62].We transform these gain and loss events into both armlevel and gene-level events, where arm-level events affect 85% of an entire arm in the same event, while genes-level events affect a single gene.We transform these into number of major and number of minor chromosome arms, and the number of major and minor alleles for the set of 295 genes that overlap between COSMIC [63] and MSigDB [64].
For both gene and arm-level events, we create a separate indicator for loss of heterozygosity on each gene.
Driver Mutations From CHASMplus [61] we gather the mutations on all COSMIC [63] oncogenes/tumor suppresor genes and binarize the presence or absence of a mutation in each gene.

S2.3 Transcriptomic data views
Transcriptomics We take the set of known oncogenes/tumor suppressor genes annotated in COSMIC [63] and included in TCGA gene expression panels.We then calculate the variance of each gene in each tumor type and take a weighted sum of these variances according to the total number of samples in each tumor type.We select the top 100 genes by this metric of "intra-disease variance".

S3 Related work
State-of-the-art gene regulatory network estimators are limited to population, cohort, and cluster-based approaches [26,65,66].Other proposals to estimate networks as the difference between a population model and a sample-leftout model lack statistical power [30].Kolar et.al achieve sample-specific network estimation without sacrificing statistical power by using an approach similar to classic varying-coefficient models that weighs samples by their distance over context [67].However, this approach inherently assumes smoothness of the parameters over a context, which does not align with our understanding of the non-linear, switch-like changes in biological systems that lead to disease.Contextual estimation networks (CENs) remove this smoothness assumption by inferring the relationship between context and model parameters with a neural network, but the CEN framework is only proposed as an adaptive learning approach for linear models [41].Context-varying linear models have previously been applied to multi-omic cancer data, where context-varying coefficients inform how epigenetic markers have patient-specific effects on clinical outcomes [42].Linear models do not inform us of the differential gene-gene interactions that explain changes in cellular behavior.To understand regulatory and metabolic variation at per-sample resolution, we require network models with context-varying structures and parameters.

Figure 1 :
Figure 1: (a) Traditional modeling approaches assume each training cohort or (sub)population is homogeneous and samples are identically distributed.Cohorts must be large enough to allow robust inference, presenting a tradeoff between personalization and power.(b) Contextualization assumes model parameters are a function of context, allowing powerful context-specific inference without a priori clustering of subpopulations or assuming homogeneity.Contexts can be unique to each sample, permitting sample-specific model inference.(c) Sample-specific models reveal population heterogeneity, relate rare pathological mechanisms to more common ones, and provide new data views for prognosis and biomarker identification.(d) Graphical depiction of the deep learning framework.Sample context is used to predict weights on each of the model archetypes, which we call the subtype.The sample-specific network is estimated as the tensor dot product of archetypal networks and subtype weights.The network archetypes are learned simultaneously alongside the context encoder using backpropagation.

Figure 2 :
Figure 2: Performance of Contextualized Markov Networks.(a) Disease-fold cross-validation, in which each of the 25 disease types are held out from training and evaluated only at testing time.Disease-specific network inference cannot be applied in this regime.(b) Testing on held-out patients.Results are from 30 bootstrapped runs for each hold-out disease type and the hold-out patient set.Bar height is the group-averaged mean squared-error of the bootstrapaveraged network models.Error bars are the standard deviation over bootstraps of the group-averaged mean squarederror of the network models.

Figure 3 :
Figure 3: Embeddings, colored by disease type, reveal the organization of different disease views.Context views alone cannot capture tumor disease types.Transcriptomic views recapitulate disease types.Contextualized networks discover new separations and similarities, revealing disease subtypes and cross-disease relationships.

Figure 4 :
Figure 4: Exploration of network subtypes for LGG, looking at correlated clinical information, arm-level copy alterations, gene-level copy alterations, and gene-level single nucleotide variations.

Figure 5 :
Figure 5: Exploration of network subtypes for THCA, looking at correlated clinical information, arm-level copy alterations, gene-level copy alterations, and gene-level single nucleotide variations.

Figure 6 :
Figure 6: Exploration of cross-disease network subtypes for cancers of the GI tract, including READ, COAD, STAD, and ESCA, looking at correlated clinical information, arm-level copy alterations, gene-level copy alterations, and gene-level single nucleotide variations.
We are not aware of any other scalable meta-learning, deep learning, or varying-coefficient methods to produce context-informed correlation, Markov, and Bayesian networks under a universal framework.As such, our baselines apply the network estimators in S1 under several well-known and general paradigms for improving model personalization, broadly relating to cluster analysis.Our population baseline provides no personalization, learning a single model for the entire population of training samples.Our context-clustered baseline takes an unsupervised approach to personalization by first doing a k-means clustering with k=25 on the aggregated context views (S2) and then inferring cluster-specific networks.Our disease-clustered baseline uses a personalization oracle, grouping samples by tumor type and then inferring disease-specific networks.

Figure S1 :
Figure S1: Exploration of network subtypes for Bladder Urothelial Carcinoma (BLCA), looking at correlated clinical information, arm-level copy alterations, gene-level copy alterations, and gene-level single nucleotide variations.

Figure S2 :Figure S3 :
Figure S2: Exploration of network subtypes for Breast invasive carcinoma (BRCA), looking at correlated clinical information, arm-level copy alterations, gene-level copy alterations, and gene-level single nucleotide variations.

Figure S4 :Figure S5 :Figure S6 :
Figure S4: Exploration of network subtypes for Colon adenocarcinoma (COAD), looking at correlated clinical information, arm-level copy alterations, gene-level copy alterations, and gene-level single nucleotide variations.

( c )Figure S7 :Figure S8 :
Figure S7: Exploration of network subtypes for Head and Neck squamous cell carcinoma (HNSC), looking at correlated clinical information, arm-level copy alterations, gene-level copy alterations, and gene-level single nucleotide variations.

( c )Figure S9 :
Figure S9: Exploration of network subtypes for Kidney renal clear cell carcinoma (KIRC), looking at correlated clinical information, arm-level copy alterations, gene-level copy alterations, and gene-level single nucleotide variations.

Figure S10 :
Figure S10: Exploration of network subtypes for Kidney renal papillary cell carcinoma (KIRP), looking at correlated clinical information, arm-level copy alterations, gene-level copy alterations, and gene-level single nucleotide variations.

Figure S11 :
Figure S11: Exploration of network subtypes for Liver hepatocellular carcinoma (LIHC), looking at correlated clinical information, arm-level copy alterations, gene-level copy alterations, and gene-level single nucleotide variations.

Figure S12 :
Figure S12: Exploration of network subtypes for Brain Lower Grade Glioma (LGG), looking at correlated clinical information, arm-level copy alterations, gene-level copy alterations, and gene-level single nucleotide variations.

Figure S13 :
Figure S13: Exploration of network subtypes for Lung adenocarcinoma (LUAD), looking at correlated clinical information, arm-level copy alterations, gene-level copy alterations, and gene-level single nucleotide variations.

Figure S14 :
Figure S14: Exploration of network subtypes for Lung squamous cell carcinoma (LUSC), looking at correlated clinical information, arm-level copy alterations, gene-level copy alterations, and gene-level single nucleotide variations.

( c )Figure S16 :
Figure S16: Exploration of network subtypes for Ovarian serous cystadenocarcinoma (OV), looking at correlated clinical information, arm-level copy alterations, gene-level copy alterations, and gene-level single nucleotide variations.

Figure S17 :Figure S18 :
Figure S17: Exploration of network subtypes for Pancreatic adenocarcinoma (PAAD), looking at correlated clinical information, arm-level copy alterations, gene-level copy alterations, and gene-level single nucleotide variations.

Figure S19 :
Figure S19: Exploration of network subtypes for Rectum adenocarcinoma (READ), looking at correlated clinical information, arm-level copy alterations, gene-level copy alterations, and gene-level single nucleotide variations.

Figure S21 :
Figure S21: Exploration of network subtypes for Stomach adenocarcinoma (STAD), looking at correlated clinical information, arm-level copy alterations, gene-level copy alterations, and gene-level single nucleotide variations.

Figure S22 :Figure S23 :
Figure S22: Exploration of network subtypes for Thyroid carcinoma (THCA), looking at correlated clinical information, arm-level copy alterations, gene-level copy alterations, and gene-level single nucleotide variations.

Figure S24 :Figure S25 :
Figure S24: Exploration of network subtypes for Uterine Corpus Endometrial Carcinoma (UCEC), looking at correlated clinical information, arm-level copy alterations, gene-level copy alterations, and gene-level single nucleotide variations.

Table 2 :
Stratification disease subtypes in terms of survival.Survival tests quantify the difference in survival distributions between groups as a p-value.Contextualized networks improve on both tests on average by several orders of magnitude compared with other subtyping methods.The multivariate log-rank test quantifies overall stratification of survival distributions across all subtypes.The minimum pairwise result is the minimum p-value of all pairwise subtype tests, showing the maximum survival stratification between prognostic subtypes.
2) are concatenated sample-wise to create a single context feature vector encompassing all views for each patient.We use a batch-size of 10 and our learning rate is chosen automatically using PyTorch Lightning's auto-lr-find with an initial state of 1e-3.Model weights are initialized as Uniform[-0.01,0.01].We split our dataset into 80% training-validation and 20% testing.We create 30 bootstraps of the training-validation set and finally split into 80% training and 20% validation, resulting in a 64-16-20 split for train-validation-test where the train and validation sets are bootstrapped to evaluate model variance.

Table S1 :
Multivariate log-rank test comparison across different subtyping methods in terms of -log(p-value).Only samples shared between all datasets are used.-indicates no samples are shared, or subtypes do not exist for TCGA.

Table S2 :
Minimum pairwise log-rank test comparison across different subtyping methods in terms of -log(p-value).Only samples shared between all datasets are used.-indicates no samples are shared, or subtypes do not exist for TCGA.