Data-driven inference of crosstalk in the tumor microenvironment

Signaling between cancer and nonmalignant (stromal) cells in the tumor microenvironment (TME) is key to tumorigenesis yet challenging to decipher from tumor transcriptomes. Here, we report an unbiased, data-driven approach to deconvolute bulk tumor transcriptomes and predict crosstalk between ligands and receptors on cancer and stromal cells in the TME of 20 solid tumor types. Our approach recovers known transcriptional hallmarks of cancer and stromal cells and is concordant with single-cell and immunohistochemistry data, underlining its robustness. Pan-cancer analysis reveals previously unrecognized features of cancer-stromal crosstalk. We find that autocrine cancer cell cross-talk varied between tissues but often converged on known cancer signaling pathways. In contrast, many stromal cross-talk interactions were highly conserved across tumor types. Interestingly, the immune checkpoint ligand PD-L1 was overexpressed in stromal rather than cancer cells across all tumor types. Moreover, we predicted and experimentally validated aberrant ligand and receptor expression in cancer cells of basal and luminal breast cancer, respectively. Collectively, our findings validate a data-driven method for tumor transcriptome deconvolution and establishes a new resource for hypothesis generation and downstream functional interrogation of the TME in tumorigenesis and disease progression.


1
The tumor microenvironment (TME) is a multi-faceted cellular environment that both 2 constrains the evolving tumor (Hanahan and Coussens, 2012) and plays a pivotal role in tumor 3 progression and therapeutic response (Junttila and de Sauvage, 2013). Existing experimental 4 methods to characterize the TME, such as imaging and single-cell based approaches, cannot 5 be applied retrospectively to existing large-scale bulk tumor datasets, representing a vast and 6 mostly unexplored resource for studying cancer cell ligand-receptor repertoires and cross-talk   (Fig 4d). This suggests that, in solid tumors, autocrine cancer signaling tends to be tumor type 5 specific and determined by the cancer cell-of-origin. In contrast, stromal autocrine signaling is 6 often conserved and independent of tumor type and tissue. Interestingly, the paracrine 7 signaling interface between cancer and stromal cell compartments also had a high number of   Fig 4e). Among these pairs, the two receptors 5 (ACVR2B and LRP6) were inferred to be overexpressed in cancer cells (~4-fold higher 6 expression) and ligands were inferred to have unaltered expression in SKCM (Suppl. Table   7 1). We then compared the expression of the 2 receptors in malignant and non-malignant cells 8 as inferred by the authors (Tirosh et al., 2016). Confirming our results, both receptors were 9 significantly overexpressed in the cancer compartment (P <1e-10, Wilcoxon rank-sum, Suppl.

17
Autocrine and paracrine crosstalk in brain tumors

18
The two brain tumor types (LGG and GBM) were both characterized by very high autocrine

19
Delta-Notch signaling scores (DLL1-NOTCH1, RC > 80%, Fig 4i). Both DLL1 and NOTCH1 20 were inferred to have >4 fold higher expression in cancer compared to tumor stromal cells and 21 normal brain tissue in both tumor types. DLL1 cancer cell expression was ~16-fold higher than stroma and normal tissue in LGG (Suppl. Fig 10).  Fig 11). EGFR is a well-studied 28 oncogene in GBM where 40-50% of patient tumors have EGFR overexpression driven by gene 29 amplification (Brennan et al., 2013). We inferred >30-fold higher EGFR expression in GBM 30 cancer cells compared to the stromal compartment and normal brain tissue (Suppl. Fig 11). In 31 contrast, all canonical EGFR ligands were expressed at highest levels in the stroma (Suppl.

26
Next, we wanted to validate the TUMERIC predicted differential expression of SFRP1 27 and IL6ST in the cancer cells of basal versus luminal breast cancer subtypes. We performed 28 RNA in situ hybridization (ISH) using specific RNAScope probes generated against either 29 SFRP1 or IL6ST in FFPE sections of breast tumors from each subtype (see Methods, Fig 5d).

30
Cancer cell mRNA expression in each subtype was quantified using two different approaches 31 yielding similar results (Fig 5e and Suppl. Fig 14). Remarkably, and consistent with the 32 TUMERIC prediction, we observed higher expression of SFRP1 in the cancer cells of the basal 33 subtype, compared to the luminal tumors (Fig 5e). In contrast, IL6ST had lower expression in 34 cancer cells of the basal tumors as compared to the luminal subtype.

19
Overall, our study provides unique insights into the complex interactions shaping the 20 TME. Our approach is especially useful in settings where bulk tumor biopsy data is either 21 already abundant or the only feasible data source, which is the case in many clinical settings.

22
We anticipate that the method could complement existing biomarker and target discovery 23 approaches by providing computationally purified molecular profiles of cancer cells as they 24 exist inside human tumors.

37
The datasets were normalized and parameters were set according to authors instructions (see  Here pi denotes the cancer cell proportion (tumor purity) in sample i, and -./-0& and 6"&%$.

26
are average expression levels for the gene in the cancer and stromal compartment (not 27 dependent on sample i), respectively. We make the simplifying assumption that these (non-  Fig 2-5). Secondly, we directly compared results for raw and log-transformed 4 data, and while the results were overall similar, we found that log-transformation provided 5 better separation between inferred cancer and stroma compartment gene expression for 6 known stromal genes (Suppl . Fig 15), which is consistent with previous empirical analysis 7 (Shen-Orr et al., 2012). 8 We found that the equation above tended to misestimate stromal gene expression for 9 genes with somatic copy number alterations (CNA) affecting gene expression in a subset of 10 the samples (for example ERBB2 in HER2-positive breast tumors). We therefore used a 11 modified approach for such genes. We first identified genes with correlation between CNA 12 and mRNA expression (comparing expression for samples with diploid and non-diploid CNA,

13
Mann-Whitney U-test, P<1e-6, to account for multiple testing) in a given set of tumors, and 14 then estimated cancer and stromal compartment gene expression using a two-step approach.