Cross-platform meta-analysis reveals common matrisome variation associated with tumor genotypes and immunophenotypes in human cancers

Background Recent sequencing efforts unveil genomic landscapes of the tumor microenvironment. Yet, little is known about the extent to which matrisome pattern is conserved in progressive tumors across diverse cancer types, and thus its clinical impact remains largely unexplored. Findings Using a newly generated, unified data resource, we conducted cross-platform assessment of a measure of altered extra-cellular matrix (ECM) composition and remodeling associated with tumor progression, termed as the matrisome index (TMI). Parallel analyses with TCGA in over 30,000 patient-derived biopsies revealed that TMI is closely associated with mutational load, tumor histopathology, and predictive of patient outcomes. We found an enrichment of specific tumor-infiltrating immune cell populations, signatures predictive of immunotherapy resistance, and several immune checkpoints in tumors with high TMI, suggesting potential role of ECM interaction with immunophenotyes and tumor immune escape mechanisms. Both epithelial cancer cells and carcinoma-associated fibroblasts are potential cellular contributors of such deregulated matrisome. Conclusions Despite wide spectrum of genetic heterogeneity and dynamic nature, matrisome abnormalities are integral to disease progression. Our resource of a curated compendium of 8,386 genome-wide profiles, molecular and clinical associations, and matrisome-tumor genotype-immunophenotype relationships identify potentially actionable immune targets that may guide personalized immunotherapy.


Introduction 44
The extracellular matrix (ECM) is a complex multi-spatial meshwork of macromolecules 45 with structural, biochemical and biomechanical cues, influencing virtually all fundamental 46 aspects of cell biology [1]. Although rigidly controlled during normal development, ECM is 47 frequently altered in many diseases, including cancer [2,3]. Previously, we had demonstrated a robust association between a pattern of 29 matrisome 56 genes and its predictive value in prognosis and adjuvant therapy response in early-stage non-57 small cell lung cancer (NSCLC) [7]. This specific pattern of matrisome gene expression is 58 termed the tumor matrisome index (TMI). Given that abnormal ECM dynamics are a 59 hallmark of cancer [3], we hypothesized that this pattern of genomic aberrations in tumor 60 matrix is associated with intrinsic resistance to the therapy, patient outcomes and other 61 clinically relevant parameters, across various tumor types. The present study thus extended 62 evaluation of the TMI to 11 major cancer types (Fig. 1A). 63 To facilitate cross-platform meta-analysis of TMI with TCGA, we generated a single, cancer 64 type-specific, merged microarray-acquired dataset (MMD), comprising thousands of 65 transcriptome profiles of both tumor and tumor-free tissues (Tables S1 and S2). Given the 66 derivation from a uniform computational pipeline, MMD accounts for technical bias from 67 preprocessing. 68 better survival for a number of patient cohorts. In contrast, there was no negative correlation 119 between the index and survival in patients with ovarian and gastric cancer; rather TMI was a 120 positive prognostic factor for recurrence and survival in gastric cancers. 121 Clinical associations with not only OS but also progression-free survival (PFS) and 122 metastasis-free survival (MFS) provides better insights into the matrisomal changes during 123 tumor progression and metastatic dissemination. For example, the index was highly 124 correlated with metastasis development in lung (HR = 8.63, P = 0.012), but not in bone (HR 125 = 0.44, P = 0.13), among breast cancer patients (Fig. 2C); this is consistent with the proposed 126 role of extracellular protein encoding genes, such as proteases and chemokines, in promoting 127 metastatic potential [10]. Liver cancer, among others, demonstrated marked predictive 128 capacity of the index for multiple surrogate endpoints (Fig. 2D). Its potential use as a 129 predictor of multicentric occurrence for early-stage hepatocellular carcinomas may guide 130 clinical course and treatment strategies, which depend on classification of subtype of 131 multifocal HCC tumors [11]. 132 Multivariate analyses taking into account age, race, gender, and clinical and pathological 133 TNM status revealed that TMI is an independent predictor of survival in colorectal, liver, 134 breast, renal, ovarian, and gastric carcinomas ( Fig. 2E and Table S12). Comparing with 135 traditional clinical features between two stratified patient groups in TCGA, we found that the 136 high-risk group had a higher proportion of patients staged pathologically as T3 or T4, 137 diagnosed as lymph node and distant metastasis positive, and classified as late stage. 138

There exists a tumor genotype-matrisome phenotype relationship 139
Given the above association with disease progression and stage of cancer, we asked if there 140 was progressive dysregulation of the TMI in the step-wise progression to advanced cancer. 141 Expression of progression markers should gradually increase or decrease during multistep 142 carcinogenesis to be clinically applicable [12]. Across breast, colorectal, and pancreatic 143 cancer, we found a progressive increase in the index during tumor development from normal 144 to adenoma to carcinoma in-situ, and further to invasive carcinoma ( Fig. 3A and Table S13). 145 Molecular subtyping in breast cancer is often associated with its predictive value for 146 prognosis and response to various treatment options [13]. We found that basal-like and 147 HER2+ tumors consistently had higher TMI than luminal tumors ( Fig. 3B and Table S14); 148 paralleling trends were reported in prior works for breast cancer specific survival [14,15]. 149 Collectively, breast cancers exhibit the strongest association of TMI with both molecular and 150 clinical features among all types of cancer investigated. 151 Using TCGA datasets, we found a strong correlation of the TMI with tumor mutational 152 burden (TMB), defined as the total number of simple somatic mutations, in pancreas, 153 stomach, prostate, colon, lung, and breast cancers (Fig. 3C, Tables S15 and S16; see 154 Methods). This suggests a potential role of matrisome molecules in providing unique 155 selective pressures in the tumor microenvironment and increasing genomic instability. Given 156 that non-synonymous somatic mutations are major determinants of tumor immunogenicity in 157 many solid tumors [16], we next asked if there exists any direct association of this abnormal 158 matrisome with the immune landscapes and conducted cross-platform evaluation of TMI in 159 the context of immune response. 160

Matrisome associates with tumor immunophenotypes and escape mechanisms 161
By applying CIBERSORT to MMDs and TCGA, we found that for many cancers, TMI was 162 closely correlated with composition of specific tumor-infiltrating lymphocyte populations 163 (Fig. 4A). Enrichment of these TILs related to both innate and adaptive immunity was 164 diverse and cancer-specific. Relative abundance of M0 and M1 macrophages, neutrophils, 165 activated mast cells, regulatory T cells (Treg), and T follicular helper (Tfh) cells, activated 166 CD4+ memory T cells generally increased, whereas that of resting CD4+ memory T cells, 167 mast cells, naïve B cells, and resting dendritic cells decreased along with tumor-promoting 168 matrix remodeling. Impact on immune infiltrates was particularly pronounced in gastric, 169 lung, colorectal and breast cancers, having significant positive and negative correlations with 170 TMI. This highlights the evolving nature of the immune response during matrisome 171 remodeling in these selected cancers. It is noteworthy that despite having no correlation with 172 mutational load, TMI of some cancer types, such as liver, renal, and ovarian cancers, remains 173 associated with specific TIL composition. This indicates that not only tumor genotypes but 174 also matrisome phenotypes may be critical determinants of tumor immunogenicity and thus 175 inter-tumor heterogeneity of immune infiltration, considering highly heterogeneous 176 expression of ECM molecules [7]. 177 We further estimated functional activation of distinct immune cells, as recently described 178 [17], for their potential role in specific killing of tumor cells upon immunotherapy. Across all 179 cancers except for bladder cancers, the activation state of CD4+ T cells was significantly 180 higher in tumors with high TMI, suggesting their central role in response to abnormalities in 181 the local matrix niche (Fig. 4B). Activations of other innate immune infiltrates such as NK 182 cells, DC cells, and mast cells, were also either positively or negatively correlated in specific 183 cancer types, whereas in others these cell types had no correlation with matrisomal changes. 184 Due to high enrichment of M0 macrophages, functional activities of M1 and M2 185 macrophages were repressed in tumors with high TMI (and thus classified as high-risk based 186 on cut-off index), shifting the tumor microenvironment toward an immunosuppressive 187

phenotype. 188
We next correlated the index directly with the expression of 20 potentially targetable immune 189 checkpoints [18] -that are currently in preclinical or clinical trials, and are  Many of these genes were enriched in tumors with high TMI, potentially responding to 191 expanded CD4+ T cells ( Fig. 4C and Table S17). The highest number of immune targets with 192 correlation was found in lung cancer; those genes whose expression levels are positively 193 correlated with TMI in both MMD and TCGA are shown in Figure 4D. Particularly, B7-H3 is 194 a promising pan-cancer immune marker, which may be selectively efficacious in these high-195 risk tumors. Altogether, these findings unveil specific immune targets that may be selectively 196 efficacious in a subset of matrisome-deregulated carcinomas. We expect that these tumors 197 will be more sensitive to immune checkpoint blockade given that they harbor sufficient 198 immune infiltrates and upregulated immune checkpoints. 199 Given promising applicability of our metrics in prediction of immunotherapy responsiveness, 200 we tested how this specific matrisome would be expressed relative to a previously defined 201 signature of cancers that fail to respond to immune checkpoint inhibitors. By applying innate 202 PD-1 resistance signature (IPRES) [19] to lung MMD and TCGA LUAD, we found that 203 patients who are predicted to be resistant, or 'IPRES-enriched', attained higher TMI than the 204 rest of the patients ( Fig. 5A; see Methods). Extending the analysis to the rest of cancer types, 205 we found that prostate, ovarian, lung, and bladder cancers further demonstrated significant 206 difference in TMI between these two predicted groups (p < 0.01 in both platforms; Fig. 5B). 207 As IPRES is predictive of resistance, not response, we further identified 161-gene 'responder 208 signature' by applying strict criteria to the DE gene list obtained from the original work (see 209 Methods). Except for melanomas, TMI of all cancers had direct negative correlation with 210 GSVA z-scores of responder signature (Fig. 5C), to different extents, with the most 211 pronounced association seen for lung cancer (Fig. 5D). 212

Epithelial cancer cells and CAFs are potential cellular contributors of tumor matrisome 213
We sought to determine the potential cellular source of this common matrisome response 214 observed in bulk tumors. By comparing with previously identified stroma-derived signatures, 215 six ECM genes were reported to be prognostic and differentially expressed in activated 216 stroma, or cancer-associated fibroblasts (CAFs), compared to normal stroma (Table S18)  tumor content, indicating tumor cells may also play a role in producing these specific tumor-224 supporting matrix components (Table S19). 225 The initial screening of micro-dissected ovarian tissues revealed significantly higher index in 226 both tumor epithelial cells and stromal CAFs compared to normal ovarian surface epithelial 227 cells and fibroblasts, respectively (Fig. 6B). Heatmaps of these samples demonstrated shared 228 matrisome patterns in both cell types (Fig. 6C). This pattern was also observed in various 229 malignances including breast, ovarian, liver, colorectal, pancreatic, and prostate cancers, 230 showing their ubiquitous alteration at the cellular level ( Fig. 6D-E and Table S20). Using 231 human/mouse xenograft models coupled with proteomics, characteristic contribution from 232 both tumor and stromal cells to the production of ECM proteins in tumors of differing 233 metastatic potential was recently reported [23]. The mode by which these cells in tumor 234 matrix produce, degrade, remodel and eventually deregulate tumor matrisome and whether 235 these cell types mutually crosstalk or instruct one another to shape tumor-promoting 236 matrisome remain to be uncovered. 237

Discussion 238
This work is based on cross-platform evaluation of TMI in over 30,000 primary tumors 239 across 11 major cancer types. Prior integrative genome-wide profiling studies are limited to 240 single platforms, interrogating either RNA-seq-based TCGA or microarray-based GEO data. 241 Curating a total of 8,836 patient-derived tumor and tumor-free samples, we generated 11 242 cancer type-specific MMDs annotated with clinical features and deposited the data at 243 ArrayExpress (see Data Availability). This approach further minimizes biased selection of 244 validation cohorts and allows parallel analyses with other high-throughput data sources -245 without having to conduct laborious data mining in search for multiple small-scale patient 246

cohorts. 247
Considering that the index was computed under the assumptions of each gene having 248 concordant regulation as observed in lung cancer, it is noteworthy that tumors with highly 249 dynamic and constantly remodeling ECM could have ubiquitously altered expression of 250 matrisome genes across genetically and phenotypically diverse epithelial tumors. The data 251 may indicate the presence of shared signal transduction pathways activated across these 252 selective tumors, such as recently reported TGFβ or HIF1α/VEGF pathways, in controlling 253 ECM balance or cell-intrinsic mechanism regulating the expression of set of ECM genes 254 contributing to tumor development [23]. 255 Here we provide a comprehensive overview of common matrisome variation in the context of 256 tumor genotypes, molecular and clinical parameters, and immunophenotypes. To the best of 257 our knowledge, this is the first study to suggest potential role of ECM molecules in 258 determining immunosuppressive microenvironment and immune escape mechanisms with 259 cross-platform evaluation of a spectrum of major human malignancies. We found a 260 significant enrichment of tumor-promoting immune infiltrates, IPRES, and multiple immune 261 checkpoints in tumors with high TMI. Heterogeneous infiltration within the same immune 262 cell population was also observed across tumors of varying matrisome indices. For example, 263 high-risk lung tumors according to TMI were significantly enriched for M0 and M1 264 macrophages, whereas low-risk tumors were negatively enriched for M2 macrophages, 265 suggesting different mechanisms adopted by tumor-associated macrophages (TAMs) to 266 promote tumor progression. 267 Many of potentially targetable immune checkpoints were enriched in tumors with high TMI, 268 possibly in dulling CD4+ T cell activation, as seen by our CIBERSORT deconvolution 269 analysis. Particularly, we found that B7-H3 emerged as a promising pan-cancer immune 270 target. We expect that such tumors will be more sensitive to immunotherapy due to sufficient 271 immune infiltrates and upregulated expression of immune checkpoints. Tumors with 272 hypermutant phenotypes are regarded as better candidates for immune checkpoint inhibitors 273 across a wide spectrum of cancer types [24][25][26]. Given positive correlation of TMI with 274 mutational burden, it could thus be postulated that tumors with higher TMI would be more 275 susceptible to immunotherapy. Retrospective analyses using IPRES, however, revealed that 276 high-risk tumors associated more closely with signature predictive of resistance to anti-PD-1 277 immunotherapy, and vice versa for low-risk tumors. This suggests that high mutational load 278 may not necessarily imply tumor response, as reported in the original study [19]. In 279 particular, lung cancer was found to have the most pronounced correlation effects with 280 IPRES signatures. Given the predictive role of PD-L1 expression for anti-PD-1/PD-L1 281 therapy in lung cancer [27], enrichment of IPRES in tumors with high TMI may be attributed 282 to insufficient PD-L1 expression in these tumors (Fig. 4C). The findings altogether reinforce 283 the need to adopt a more "holistic" approach to find biomarkers for immunotherapy response 284 that take into consideration tumor, tumor microenvironment and host immunity. 285 Matrisomal changes during the colorectal and breast cancer multistep carcinogenesis were 286 also implicated. This holds particular promise as no more than a dozen or so "driver" 287 mutations, mostly occurring in tumor suppressors and oncogene, are thought to be potential 288 progression markers of colorectal cancer [28]. Likewise, despite much efforts to characterize 289 ductal carcinoma in situ (DCIS) and invasive ductal carcinoma (IDC), there is currently no 290 standard test accurately identifying tumors likely to progress from preinvasive to invasive 291 carcinomas in breast cancer [29]. Unlike limited information that could be inferred from the 292 binary mutational status, the interpretation of progressively changing TMI on a continuous 293 scale as a progression marker will be clinically pertinent and straightforward. 294 Since both tumor and CAF are sources of ECM proteins, targeting both cancer cells and 295 CAFs may represent a better strategy to improve antitumor efficacy by remodeling the tumor. 296 Even if actionable targets are found in cancer cells, excessive ECM deposition followed by 297 CAF accumulation presumably build physical barrier and thus prevent efficient drug delivery 298 [30]. Owing to low likelihood of developing clonal selection and therapy resistance, genetic 299 stability of CAFs further makes them promising therapeutic targets [31]. CAF-targeting 300 approach, however, has been challenged for many reasons, such as lack of specific markers 301 and resulting toxicity [32]. The genes that make up this TMI may provide specific targets that 302 are of relevance for this purpose.  (table S2), and is an independent 311 predictor of survival. This facilitates easy application and practical development into an 312 affordable multi-gene panel that can be assayed with conventional qPCR in clinical practice. 313 These metrics, nevertheless, will require functional validation in prospective clinical studies 314 and necessitate a standard tissue procession protocols for precise measurement. 315

Preprocessing of Datasets 317
All 147 independent public datasets used in this study are summarized in Table S1. 318 Preprocessing methods, number of patient sample, platform assayed, and genes included in 319 the computation of TMI are recorded in Table S2. Raw data of independent studies were 320 RMA-normalized using the affy package [35] or preprocessed-or author-defined normalized-321 data were used as stated in Table S2. Most were assayed with the full 29-gene platform 322 (Affymetrix-GPL570), although some microarray platforms had few missing genes, which 323 are stated in Table S2. As genes without having at least 1 count in at least 20% patients were 324 excluded from preprocessing (described below), different number of ECM genes were 325 included in the computation for the final index for TCGA data. Probes having maximum 326 expression values were collapsed to the genes for subsequent index scoring. 327

TCGA Cohorts 328
Using TCGA-Assembler package [36], the Cancer Genome Atlas (TCGA) data of 11 329 epithelial cancer types were collected, processed, and annotated with clinical parameters 330 (Tables S1 and S2). Due to lack of normal tissue samples, OV and SKCM cohorts, 331 representing ovarian and melanoma cancers respectively, were excluded in DE analyses. 332 TCGA-Assembler R package [36] was used to extract normalized RPKM count values. In 333 each dataset, we excluded genes without minimum 1 counts per million (cpm) or RPMK 334 value in less than 20% of total number of samples were excluded using edgeR package [37]. DownloadBiospecimenClinicalData function in TCGA-Assembler package [36]. 339

Generic ECM Signature 340
We extracted the ranking of 29 ECM genes and visualized their positions in each DE gene 341 list from each cancer-specific TCGA cohort with circular plots generated via the circlize 342 package (Table S5) [39]. To further investigate the extent to which ECM genes exhibit 343 greater degree of DE variation among 29 lung-specific ECM genes across all tumors, we 344 derived a "generic" ECM signature based on a weight computed for each ECM gene using 345 the following formula, which was slightly modified from equation that was used to derive 346 generic EMT signature in a previous study [40]: 347 where D is the total number of diseases (D = 6 in this study; prostate, renal, gastric, 349 colorectal, breast, liver, and bladder cancer), fc gd and P gd are the fold-change and adjusted P 350 value of the ECM gene, g, of disease, d, and n d is the number of patient samples in each 351 TCGA cohort (Table S6). As not all 29 ECM genes were present in the final cancer-specific 352 DE gene list due to preprocessing, each gene was computed with different number of total n i 353 for the weight. "Generic" ECM signature was then derived from 17 ECM genes having a 354 coding genes. Expression profiles of these 29 genes were then extracted from collapsed 364 normalized data and computed for the index, as previously described [7]. 365

Diagnostic Performance 366
Complete lists of the patient ID and respective personalized TMI from both tumor and tumor-367 free samples across microarray and RNA-seq platforms are recorded in tables S7 and S8, 368 respectively. TMI of tumor and normal groups were compared using the Mann-Whitney-369 Wilcoxon test. To further statistically evaluate the diagnostic accuracy of the TMI in deciding 370 the presence of the disease, we computed the area under the receiver operating characteristic 371 (ROC) curve, sensitivity, and specificity with the best threshold determined by the pROC 372 (Table S9). All ROC curves generated were subjected to binormal smoothing for 373 illustration (Fig. 1D). 374

Quantification of TMI Spectrum 375
We analyzed the Expression Project for Oncology (expO) data provided by the International 376 Genomics Consortium (IGC, USA, www.intgen.com) for ECM spectrum quantification, 377 owing to their procurement of tumor samples under standard conditions. This minimized 378 potential non-biological variations across multiple cancer types, and thus allowed us to 379 perform comparative analysis; data from MMDs and RNA-seq cohorts were excluded for 380 possible batch-effects and different number of genes included in ECM risk scoring, 381 respectively. All patients annotated with prostate, lung, renal, liver, gastric, breast, ovarian, 382 pancreatic, colorectal, and bladder cancer from the expO dataset were included in the 383 spectrum quantification (Fig. 2E and Table S10). 384

Patient Stratification 385
For each preprocessed independent datasets (Table S2), cut-off index was determined using 386 the Cutoff Finder algorithm[42] to stratify each patient cohort into low-and high-risk groups. 387 Statistical summary of 72 datasets used in survival analysis is recorded in Table S11.  Meier (KM) survival curves were derived for OS and DSS endpoints (Fig. S1) and other 389 multiple endpoints (Fig. S2)

Molecular Subtypes of Breast Cancer 399
Of the 147 independent datasets, five datasets (BRCA, GSE20711, GSE21653, GSE19615, 400 and GSE50567) were annotated with either ER, PR, and HER2 status or defined molecular 401 subtypes of breast cancers (normal breast-like, luminal A, luminal B, HER2 positive, and 402 basal-like); tumors harboring positive status of ER and/or negative status of HER2 were 403 classified as luminal cancers while the basal-like tumors were defined as ER-, PR-, and 404 HER2-negative cancers (table S14). 405

Total Mutational Burden 406
We obtained the mutational load of TCGA tumor samples across nine epithelial cancer types 407 from the NCI GDC Data Portal (http://portal.gdc.cancer.gov/projects/) using the same TCGA 408 samples used to generate TMI in our previous analyses. The portal defines the mutational 409 load as the total number of simple somatic mutations. Only tumors harboring at least one 410 mutation were included and log 10 transformed to compute Spearman correlations between 411 TMI and mutational load. To further relate the cut-off of TMI (identified in prior survival 412 analyses) to clinical application, we stratified each TCGA cancer cohort into low-and high-413 risk groups and compared the mutational load between two groups using the Mann-Whitney-414 Wilcoxon test (Tables S15 and S16). converted to z-scores. We then obtained the mean z-score for each sample and applied a 420 cutoff of > 0.35, as previously described [19] to define a patient as "IPRES-enriched". We 421 further defined 'responder signature' as 161 highly expressed genes (log FC > 2 and P < 0.1) 422 in responders compared to non-responders using initial 693 DE genes identified in the 423 original work [19]. Similar to IPRES, GSVA z-score was computed and correlated with TMI. 424

CIBERSORT 425
The estimated fraction of individual immune cell types was computed using the beta version 426 of CIBERSORT (http://cibersort.standford.edu/). For any given sample, we calculated 427 Spearman correlation between TMI and relative abundance of each immune cell type using 428 our 11 generated MMDs and 11 TCGA cohorts (LUAD, OV, SKCM, BLCA, LIHC, BRCA, 429 COAD, STAD, KIRC, PRAD, and PAAD). As our MMDs of breast, colorectal, and lung 430 cancer exceeded maximum load capacity (500 MB), 1,000 tumors were randomly selected 431 for input data files. We selected LM22 (22 immune cell types) for signature gene file, 100 for 432 permutations, and disabled quantile normalization for all runs.

Competing interests 465
The authors declare no competing interests.       Multivariate Cox regression analyses using TCGA and comparison with conventional clinical 638 parameters between two groups stratified based on the TMI (Table S13). 639