JAMMIT Analysis Defines 2 Semi-Independent Immune Processes Common to 29 Solid Tumors

Immunophenotype of solid tumors has relevance to cancer immunotherapy, as not all patients respond optimally to treatment utilizing monoclonal antibodies. Bioinformatic studies have failed to clearly identify tumor immunophenotype in a way that encompasses a wide variety of tumor types and highlights fundamental differences among them, complicating prediction of patient clinical response. The novel JAMMIT algorithm was used to analyze mRNA data for 33 cancer types in The Cancer Genome Atlas (TCGA). We found that B cells and T cells constitute the principal source of variation in most patient cohorts, and that virtually all solid malignancies formed three hierarchical clustering patterns with similar molecular features. The second main source of variability in transcriptomic studies we attribute to monocytes. We identified the three tumor types as TC1-mediated, TC17-mediated and non-immunogenic immunophenotypes and used a 3-gene signature to approximate infiltration by agranulocytes. Methods of in silico validation such as pathway analysis, Cibersort and published data from treated cohorts were used to substantiate these findings. Monocytic infiltrate is found to be related to patient survival according to immunophenotype, important differences in some solid tumors are identified and deficiencies of common bioinformatic approaches relevant to diagnosis are detailed by this work.


Introduction
Interest in defining the mechanisms responsible for immunotherapeutic response versus resistance have spawned numerous studies. Solid tumors have a fairly similar clinical response rate to immunotherapeutics, and recent results have suggested that T cell polarization may be a critical element in determining patient response to different pro-inflammatory monoclonal agents 1 .
Early findings suggested that Type I immune responses (those involving high levels of interferon-γ, interleukin-2 (IL-2), perforins and granzymes) are generally necessary for favorable response to anti-cancer monoclonal-based immunotherapy. Interaction of programmed death-1 (PD-1) with its cognate ligand, PD-L1, inhibits IL-2 production and inhibits expansion of TH1-polarized T cells, while blockade of PD-1 and other inhibitory ligands can result in expansion of anti-tumor T cell clones 2 . High numbers of regulatory T (Treg) cells can compete for IL-2 and thus dampen an inflammatory milieu while supporting the polarization of TH17 cells 3 and promoting survival of CD8 + TC17 cytotoxic effector T cells. Tumoricidal activity within the tumor microenvironment is mediated by release of perforins and granzymes upon T cell recognition of antigens displayed on major histocompatibility molecules expressed by host tissues and by natural killer cells, which recognize lipid antigens. The activity of innate immune cells has been less extensively studied in cancer immunophenotype and pro-inflammatory therapies. However, a number of investigations have implied that "M2" macrophages, "tumor-associated macrophages," and "myeloid-derived suppressor cells" can be deleterious to patient health 4-8 .
The Cancer Genome Atlas 9 (TCGA) repository was launched to address the serious nature and high prevalence of the disease with a standardized, well-planned and multi-modal inquiry among large cohorts of diagnosed patients. The project was made possible in part through advances in highthroughput technologies and data acquisition, and public access to this data has prompted analyses from various approaches. An expanding number of these have been published, pertaining to both individual cancer types 10,11 and to pan-cancer studies 12,13 .
Small cohort studies have investigated cancer immunophenotype with promising results 14,15 . However, few analyses have characterized the immunologic similarities among solid tumors that account for the observed broad spectrum of activity of monoclonal anti-cancer agents across multiple tumor types, and no known bioinformatic method has clearly defined the role of CD4 + TH17 and CD8 + TC17 T cells in cancer from transcriptomic data of whole tissue. Also, the role of the cells of the innate immune system has seldom been included in analyses of tumor immunophenotype, impeding development of a broad consensus with regard to the role of tumor-associated macrophages (TAM's) and "M1" and "M2" macrophages in cancer. Members of our group previously developed JAMMIT (Joint Analysis of Multiple Matrices by Iteration 16 ) and applied the technique to both experimentally-acquired data and the publicly-available TCGA database. Here we show results based on JAMMIT analysis of multiple tumor types from TCGA. We relate this to previous work with collaborators that was collected during an experimental study of biliary tumors which utilized a fluorocholine PET tracer 17 and profiled global gene expression concordantly with microarray technology. Utilizing these unique approaches to stratify patient cohorts, we provide evidence of the basis for response to immunotherapy for which standard bioinformatic approaches have often sought but have not detailed a strong consensus, while also suggesting new directions in diagnosis, therapy and inquiry.

Loadings Plots of JAMMIT Signatures Define a Characteristic "3-Peaks" Signature Common to Solid
Tumors. We used JAMMIT to analyze messenger RNA (mRNA) data available in TCGA (https://www.cancer.gov/tcga). JAMMIT uses a "sparse" version of the singular value decomposition to reduce a pre-processed data matrix to a small signature of several hundred genes with high variation among samples by means of a sparse matrix approximation of rank-1. Although JAMMIT is highly adaptable for performance of multi-omics studies, analyses in this publication were primarily unimodal.
Because JAMMIT produces a rank-1 approximation, some discretion was used in manual adjustment of clustering schemes resulting from hierarchical clustering of samples according to JAMMIT solutions.
Upon plotting JAMMIT solutions, a highly conspicuous "3-peaks signature" emerged. Organization within this 3-peaks signature reflects arrangement of gene symbols into sorted alphabetical order. Thus, a small peak of CD (cluster of differentiation) markers, chemokines and chemokine ligands 18 (with abbreviations beginning with CCL, CCR, CXC and the like) comprise a peak at the left of the loadings plot ( Figure 1A-F), while genes of VDJ recombination 19 form two distinct peaks located near the center and at the far right of the loadings plots, respectively. The central peak (corresponding to immunoglobulins) consistently contributed most to the overall solution, while T cell receptor-related genes consistently represented the second most variable element among cohorts. A less conspicuous fourth peak relating to FC receptors 20 was also visible on most plots. To further clarify the significance of the common gene expression among tumors suggested by the loadings plot, several heatmaps of these genes have been included (Figure 1-figure supplement 1A-F). These plots testify to a higher or lower clonal diversity and relative abundance of B cells and T cells among samples independent of tissue type. These findings also demonstrate that the greatest source of variability in individual patient samples measured by bulk mRNA expression in 29 of 31 solid tumor types was related to immune system function and the relative abundance or absence of its cellular components.
Individual JAMMIT solutions had high overlap and with noted exceptions, were primarily immunologic in character. Another subset of genes identified in typical JAMMIT solutions represented mRNA expressed by tissue at the anatomic site of tumor occurrence. This formed the basis for a dichotomy between TC1mediated tumor microenvironments and non-immunogenic tumors. To emphasize common features of these solutions, we produced an UpSet figure 21 that shows their intersection. As suggested in Figure 1G, solutions showed a modest intersection of genes across a range of tumor types. Many genes involved in the larger intersections shown are related to VDJ recombination; several immunotherapeutic targets and cluster of differentiation markers are also included. To make the character of JAMMIT signatures more coherent, we isolated genes common to the 33 JAMMIT solutions. After removing genes of VDJ recombination, average expression values of the top 500 genes common to JAMMIT solutions were displayed on heatmaps (see also Figure 1 -supplementary file 1). Ingenuity Pathway analysis (IPA) of these 500 genes characterized them as associated primarily with TH1 and TH2 activation pathways. These heatmaps show that JAMMIT isolates genes that separate cancers into distinct phenotypes independent of anatomical location.

Functional Analysis of JAMMIT Signatures.
To identify biological processes captured in the JAMMIT signatures, we again used IPA. Though precise details varied, IPA returned generally similar results for all cancers. Three pathways significantly involved in all 33 cancer types suggested universal presence of neutrophils and monocytes. In most cohorts, IPA also identified the TH1, TH2 and TH17 activation pathways as top canonical pathways with p < 0.05. These results are presented as a bar chart (Figure 2A) and also as an UpSet figure (Figure 2-figure supplement 1A). Several cohorts stood as relative outliers, but a majority of tumor types intersected in some 30 to 40 molecular pathways.
To validate these findings, we used signatures of genes expressed by innate lymphoid cells 22 and markers of Type I, Type II and Type III immune responses 23 . As shown in Figure 2B,C and Figure 2figure supplement 1B,C, these results support IPA's identification of one cluster of samples as mediated by Type I immunity. However, genes of Type II and Type III immunity did not show consistent, high differential expression of similar quality. We did not interpret this finding as suggesting that Type II or Type III immunity are strictly irrelevant to solid tumors. Rather, we interpreted these results to imply that only Type I immunity can be recognized in standard approaches to bioinformatic inquiries of solid tumors. Ordinary analysis of differential expression was not sufficient to distinguish all immune responses in the tumor microenvironment. It is important to mention that no known epidemiological studies have drawn strong ties between TH2-mediated disease such as allergy or asthma to high incidence of cancer. The presence of CD4 + TH17 T cells and CD8 + TC17 cells have been difficult to characterize in bioinformatical analyses of whole-tissue specimens (see for example publications by Thorsson,et al 24 ,Hoadley,et al 13 ,or for an analysis more similar to the present work, see Zhou,et al 15 ). We therefore tentatively identified the clusters present as TC1-mediated, TC17-mediated and nonimmunogenic immunophenotypes, then characterized these clusters further.

Hierarchical Clustering According to JAMMIT Solutions Defines 3 Phenotypic Clusters for Multiple Solid Tumors.
Reduced matrices of mRNA values to genes determined by JAMMIT solutions produced heatmaps with conventional appearances (Figure 3A-F). Following after the similarity of JAMMIT solutions and loadings plots, hierarchical clustering revealed similar biological processes in multiple solid tumors. B cell-expressed genes clustered tightly along the rows with high differential expression among clusters. Genes of T cell receptors (including those of γδ T cells 25-) ordinarily clustered closely with genes expressed by myeloid cells. Inflammatory markers, B cell-related transcripts, and nearly every immune cell-specific transcript, including markers of Type II and Type III immunity demonstrated highest expression in a single cluster.
Conversely, highest expression of a small number of JAMMIT-defined genes typically characterized nonimmunogenic samples. Intermediate expression of most JAMMIT solution-defining genes was generally found in the "middle cluster," which became a focal point of our studies. In every dataset with the distinctive "3-peaks" pattern, this middle cluster showed modest expression of B cell and T cell transcripts, chemokines and cytokines compared to the inflamed cluster, but relatively high expression of the same genes by comparison with the immunologically silent specimens. Ranking and enumeration of highly-expressed genes in this middle cluster produced results that were neither intuitive nor distinct. Comparison of only a few datasets made it clear that there was no strong or obvious biological process related to high differential expression in these weakly-immunogenic samples. The estimated proportions of the 3 phenotypic clusters in each of 29 solid tumor types for which such a clustering was feasible is shown in Figure 3

-figure supplement 1B.
A subtle feature of the hierarchical clustering patterns is highlighted for 977 cases of breast cancer ( Figure 3G). Standard bioinformatic procedures produced three large clusters of samples with distinct molecular features. However, a closer view reveals that there are actually six hierarchical clusters present, with immunologically similar clusters interrupted by juxtaposition of mathematically similar clusters. As observed at the lower portion of the clustergram, the second and fourth clusters share a similar feature in relatively low expression of a number of genes at the bottom of the heatmap, but are distinguished from one another by a higher clonal diversity and abundance of B cells in the second cluster. The fourth cluster of tumors has molecular features similar to the cluster at the far right and separates two immunologically related clusters on either side. Samples within clusters 3 and 5 differ in the expression of genes related to the "Warburg effect," as do the first two clusters at the left of the heatmap. These differences proved to be related to the degree of granulocyte and agranulocyte infiltration according to IPA (Figure 3

-Supplementary file 1).
This leads to discussion of another finding of our work performed with PET imaging.
A 3-gene Warburg Signature is Correlated with Monocytic Infiltrate, Classical Markers of "M2" Macrophages and Survival. Positron emission tomography has been applied in clinical imaging for over 25 years, and has often utilized a fluorodeoxyglucose tracer 26 . With collaborators 17 , we employed an experimental fluorocholine tracer and found that some tumors had high uptake of glucose but poor avidity for fluorocholine. We identified these as "Warburg" tumors, after Nobel laureate Otto Warburg. Warburg recognized that abnormal fermentation was present, and claimed that cancer was characterized by upregulated glucose uptake and aerobic glycolysis 27 . In more modern studies, this Warburg effect has been characterized in terms of enzymes involved in glycolytic pathways 28,29 . This has also been more recently called the "Reverse Warburg Effect 30,31 ," in recognition of the involvement of multiple cell types in establishing a "lactate shuttle" among cancer-associated fibroblasts 32 and malignant tumor cells. Through application of JAMMIT to microarray data supervised by a PET scan enabled pharmacokinetic parameter, feature reduction techniques allowed derivation of a 3-gene signature (PKM, SLC16A3 and SLC2A1) that preferentially identified tumors with poor uptake of fluorocholine.
We sought to explore this signature's relevance. Upon examination of TCGA data ordered by ranked expression of this signature, we found that samples with high and low expression differed significantly in expression of classical markers of "M2" macrophages 33,34 across cohorts ( Figure 4A,B). Cibersort deconvolution of mRNA data validated this finding with estimates of increased monocyte infiltration ( Figure 4C-F). In concord with findings of others who have explored the relevance of myeloid-derived suppressor cells 6 and "M2" macrophages 7 , we found that this 3-gene signature had prognostic value in multiple patient cohorts (see representative examples in Figure 4G,H).
Some have suggested that variation in myeloid cells subsets may be associated with different patient outcomes 35,36 . From our vantage point, studies of the tumor microenvironment only provide information concerning the relative abundance or scarcity of myeloid cells. According to JAMMIT solutions, differences in immunophenotype overshadowed any differences in innate immune cell infiltrate; thus we emphasize once again that the primary difference in global transcriptome among patients with solid tumors in most cohorts is defined by immunophenotype, while the differences in innate immunity are a cause of secondary variation. However, we did note that when viewed in isolation from other immunophenotypes, patients bearing TC17-mediated tumors who had high levels of monocytes within the tumor microenvironment had an unexpected survival benefit compared to those who had few (Figure 4-figure supplement 2). Whether this reflects a functional difference in the myeloid cells in these patients is unclear. Speculatively, the prognostic difference may reflect a difference in the cytokine influence of Type III immunity upon innate immune cells or immune cell-cell interactions. Our interpretation of a lack of Type II immunity in cancer does lead us to favor the opinion that eosinophils are not prominent in ordinary solid tumors, however.

The Warburg Effect and Markers of Activation and Differentiation.
Observed modulation of macrophage gene expression by lactate has been reported 37 . Similarly, lactate can affect gene expression of other immune cell subsets 38 . These facts and others led us to inspect a wider panel of standard and novel immunotherapeutic targets for modulation by this Warburg effect in other cohorts. Similar to markers of M1 and M2 macrophages and TAM's, transcript expression of these genes and other well-known immune cell surface markers mirrored the expression of B cell and T cell receptor gene distribution with regard to the immunophenotypic clusters, but exhibited more modest differential expression (Figure 4 -figure supplement 1A,B). Given that many of the targets of modern immunotherapy are expressed by T cells, we found it particularly noteworthy that high levels of the Warburg effect accompanied high expression of many of these ligands. This was often true despite a modest reduction in the number of infiltrating T cells (see next section and Figure 4C-F).
The TC1-Mediated Tumor Immunophenotype. In the case of TC1-mediated TME's, strong evidence provided by IPA analysis suggested that the cluster was characterized by Type 1 immunity much as defined by Annunziato, et al 23 . High expression of genes specific to Type 1 innate lymphoid cells in multiple tissue types (see again Figure 2C and section on Innate Lymphoid Cells below) generally corroborated this finding. At the molecular level, these clusters showed high differential expression of hallmarks of Type 1 immunity, such as IFNG, IL-2, IL-12, T-bet, and eomesodermin in most tissue types. In spite of IPA's identification of "TH1-TH2 pathway" as a top canonical pathway, markers of Type 2 immunity such as CRTH2 39 , IL-4, IL-5, IL-13 and GATA3 40 often showed no consistent, organized pattern of differential expression in these clusters. Highly conspicuous was the finding that essentially all genes related to B cells (CD79A, MS4A1, MZB1, PAX5, etc.) and B cell-chemotactic factor CXCL13 exhibited highest average expression in this cluster. Similarly, the TC1-mediated cluster contained the highest average expression of more than 50 T cell transcripts, consistent with highly-infiltrating TC1 CD8 + and TH1 CD4 + T cells that display activity against intracellular pathogens 41 . Though a better understanding of the relationship between Treg cells and TH17 CD4 + T cells has emerged 42 , the finding of FOXP3 as a marker of Treg cells having their highest abundance within these clusters is novel. Most of these findings applied uniformly across all solid tumors. These findings indicated a generally higher level of immune cells in the TME with a predominant signature characterized by TC1 cells.
The Non-Immunogenic Tumor Immunophenotype. For the third cluster of samples with a "cold" appearance on heatmaps, we investigated mutations of β-catenin based on previously published evidence 43,44 an established high prevalence of β-catenin mutations among multiple tumor types 24 , and molecular features consistent with immuno-exclusive TME's, a.k.a. "immune deserts." These immuneexclusive microenvironments were present in every tissue type (see again Figure 3A-G) and comprised one-third of the >9,000+ tumors of the 29 solid tumor types studied for which 3 distinct clusters could be identified. Generally, TC17-mediated tumors were distinguishable from non-immunogenic counterparts by more than 100 differentially-expressed genes of VDJ recombination. In pairwise comparisons, at least as many of these genes withstood Bonferroni correction in every dataset with at least 200 samples. One exception was found for the low-grade glioma (LGG) cohort, which had a peculiar aberration of low CXCL13 expression within the TC1-mediated cluster. Mutations of genes in the Wnt-β-catenin signaling pathway are known to disrupt cell-surface expression of proteins necessary for normal physiologic T cell adherence to the exterior of malignant cells 43 , which may explain why genes of T cell receptors would be absent from these tumors.
The status of this "non-immunogenic" cluster first became clear upon review of the literature of pertaining to bioinformatical analysis of hepatocellular tumors by both the TCGA research network authors 11 and very similar results produced by another team 45 . In both cases, the rate of β-catenin mutations among large cohorts of patients with hepatocellular carcinoma was found to be ~27%, matching rather precisely with hierarchical clustering according to our own JAMMIT-defined solution for hepatocellular carcinoma (LIHC).
Because we did not use raw DNA sequence data in our analysis, we felt it appropriate to further confirm the non-immunogenic nature of these samples with the data available at hand. Based on expression of an 89-gene signature recognized by IPA as pertinent to Wnt-β-catenin signaling, we produced heatmaps of differentially expressed genes for several tumor tissue types. A typical result showing generally higher expression of most genes of the canonical Wnt-β-catenin signaling pathway within TC1-and TC17mediated clusters is shown in Figure 5 -figure supplement 1A.
The TC17-Mediated Tumor Immunophenotype. The molecular similarities shared by TC1-mediated tumors included a long list of immune cell transcripts demonstrating high expression confined to that cluster, but the middle cluster was not as simply identifiable as being infiltrated by TC17 and TH17 T cells. IPA analysis failed to identify the TH17 pathway among the most highly significant top canonical pathways associated with this cluster in multiple tissue types. TC17/ TH17 signature genes (CD73 46 , DPP4 47 , RORA, RORC 48,49,50 ) and genes related to TH17 function (IL-6 51,52 ) showed little or no differential expression across most tumor types. Another well-known marker of TH17 CD4 + T cells, IL-17, exhibited little differential expression in TCGA cohorts and has been shown to have neutrophils as its primary physiological source of relevance 53 . However, at the bulk level, these tumors were clearly distinguished on heatmaps (Figure 3A-F). The differences already noted in the distribution of genes of VDJ recombination pointed to an unequivocal and consistent difference in the abundance of adaptive immune cells attributable to a phenotypic distinction. Recalling that genes of high differential expression within the weakly immunogenic clusters of multiple cohorts showed little to no overlap, we proceeded to analyze these samples with the assumption that they differed from both TC1-mediated samples and non-immunogenic tumors. After clustering each of the datasets according to its corresponding JAMMIT solution, with small manual adjustments to hierarchical clustering results, we characterized the middle cluster for each cancer type in a series of six pairwise comparisons. Average transcript expression level and a Student's t-test applied to the entire transcriptome allowed construction of 6 UpSet figures (Figure 5 -figure supplement 1B-G). We chose to use an arbitrary limit of 500 genes for each pairwise comparison to construct these figures. These data (represented in Figure 5 -figure supplement 1B, E) show that TC17-mediated tumors have a gene expression profile which is similar to TC1-mediated tumors when compared to non-immunogenic tumors, but reflects the gene expression profile of a non-immunogenic tumor by comparison with the highly-infiltrated TC1-mediated tumors (Figure 5 -figure supplement 1D, F).
Explanation of why TC17-mediated tumors would have moderate levels of immune infiltrate in wholetissue specimens came from a study of hepatocellular tumors by Dong-Ming, et al 54 . This study showed that IL-17-secreting CD8 + T cells were found concentrated at the invading edge of tumors, rather than throughout malignant tissue. This is consistent with the biological function of TC17 T cells, which act by inducing cells of host tissues to secrete defensins and other anti-microbial peptides 55 and enhance proliferation through MAPK signaling 56 . In normal physiology the secretion of antimicrobial peptides has an adaptive role, but activation of MAPK in cancer is capable of promoting tumor expansion. In the context of Type III immunity, cellular proliferation is further reinforced through IL22 signaling from Type 3 innate lymphoid cells and TH22's 57 via induction of STAT3 signaling 58 . Thus, low expression of eomesodermin 41 , granzymes and perforins characteristic of TC17 cells cause these tumors to be relatively depleted of immune infiltrate while tumor growth is facilitated. Cibersort 59 estimates validated the lesser abundance of both CD4 + and CD8 + T cells in TC17-mediated tumors in nearly every tissue type examined, except in renal clear cell carcinoma, thyroid and ovarian carcinoma cohorts, where TC17 cells or TH17 cells were estimated to outnumber TC1-polarized cells in one of these compartments (see Figure  5E-H).
Though it was not a primary aim of this study, it is worth noting that immunophenotypic cluster conferred no significant survival benefit in the handful of datasets we examined individually. Again, at the level of the entire TCGA cohort of patient samples, we found that the Warburg effect was prognostic (Figure 5A-D).

Cross-validation with Data from a Treated Cohort.
Given the recent findings in pre-clinical models that suggest tumors infiltrated primarily by TH17-polarized cells do not respond well to single-agent immunotherapy 1 , we sought to validate our interpretation of the biological significance of the hierarchical clustering we observed in multiple tumors types. To carry out this validation, we downloaded RNA sequencing data from a cohort of melanoma patients with corresponding information on response to therapy from dbGap 25,60 . Unfortunately, data acquired from these patient cohorts used a sequencing read depth with insufficient coverage to make our JAMMIT solutions of optimal utility in this setting. Therefore, to investigate whether the middle cluster represented tumors infiltrated by TC17 and TH17 T cells, we took into account published data in which the gene expression profiles of patients were measured using a Nanostring 61 platform during courses of anti-PD-1 therapy for treatment of metastatic melanoma. As Figure 6 shows, the gene expression pattern of patients classified as having a more favorable response to therapy in melanoma is more consistent with that of samples identified by JAMMIT analysis as TC1-mediated specimens rather than with TC17-mediated tumors in several cohorts. This was confirmed using Fisher's exact test. Though some of the patients in this small cohort were not treatment-naïve, this analysis suggests that Nanostring may be a highly cost-effective platform for measuring gene expression in this context when compared to deep-sequencing of whole tissue specimens.
In performing these validations on multiple datasets, we also discovered a limitation of gene expression-based approaches used to predict or measure patient response to immunotherapy. As can be seen for the TCGA cohorts for squamous cell cancers of the lung and head and neck, and also for endometrial cancers, the gene signature that characterized patients responding to anti-PD-1 therapy in melanoma did not extend to all cancer types. In order to investigate the potential reasons for this restriction, we therefore deepened our investigation of these outliers and revisited previous analyses.

Innate Lymphoid Cells and JAMMIT Analyses.
Based upon Cibersort estimates that showed high numbers of monocytes and low numbers of T cells for several datasets, we suspected that innate immunity might be a cause of the different gene expression profiles in our validations pertaining to patient response to immunotherapy and the proposed TC1-TC17-Non-Immunogenic clustering scheme. With reference again to the previous figures showing that a broad range of immune cell-surface markers, immunotherapeutic targets and TAM markers generally showed high expression in TC1mediated tumors and in concert with high levels of the Warburg effect ( Figure 4A,B and Figure 4figure supplement 1A,B), it was difficult to explain the opposing trend observed in the head and neck squamous cell carcinoma (HNSC) and lung squamous cell carcinoma (LUSC) cohorts (Figure 7 -figure  supplement 1A,B).
This seemingly unusual departure from other solid tumors was explained by examining expression patterns of genes expressed by innate lymphoid cells. All three major lineages of innate lymphoid cells have been sequenced at the single-cell level 22 , and as genes specific to each lineage were examined (see again Figure 2C), we also took into account a larger signature of genes (222 in all) common to all three lineages. As observed with some large TCGA cohorts, high expression of the genes of this signature again clustered preferentially among samples displaying the TC1-dominant immunophenotype (Figure 7A,B).
Here LGG and LIHC serve as two "prototypical" cohorts (see also Figure 7 -figure supplement 2A,B). Equivalent analysis of the corresponding cohorts for HNSC and LUSC produced very different results, however ( Figure 7C and Figure 7 -figure supplement 2C). From the results of these parallel analyses, we conclude that hierarchical clustering of JAMMIT solutions for TCGA datasets defines a TC1-TC17-Non-Immunogenic trichotomy of solid tumors for all cohorts of patients with solid tumors exhibiting significant immune activity among at least a certain proportion of samples. However, these immunophenotypes do not always align predictably with the distribution of the cells of innate immunity. The presence of monocytes within tumor microenvironments, particularly in high numbers, appears to have the capacity to disrupt an orchestrated immune response and can skew interpretation of gene signatures.

Discussion
The search for biomarkers to predict patient response to immunotherapy has often utilized measurements of gene expression taken from the tumor microenvironment. Our studies suggest that one confounding factor in analyses of this type is a failure to take into account the variable number of innate immune cells within the tumor microenvironment being assayed. Another problem in these approaches has been that insufficient read depth has placed bioinformaticians at a great disadvantage in discerning various immune and non-immune phenotypes present in most solid tumor types.
As part of these investigations, JAMMIT analysis allowed us to approximate a TC1-TC17 dichotomy present in multiple solid tumors from TCGA that previous bioinformatic techniques have not addressed. Tumors with higher numbers of TH17 CD4 + T cells have been demonstrated in pre-clinical models to be correlated with a lack of effectual treatment response to one or more forms of single-agent immunotherapy utilizing monoclonal antibodies. Thus, the addition of a TGF-β-blocking agent may improve patient response rates by approximately 100% compared to single-agent therapy. In at least one validation dataset outside of the TCGA compendium, JAMMIT was unable to produce such a distinction among the samples of a large patient cohort (n = 164). This may be due to insufficient read depth of the mRNA sequencing assays. We also concluded that another reason bioinformatics approaches have failed to identify TC17-mediated tumors as a distinct immunophenotype is that they have relied upon high differential expression of genes as a standard to characterize biological processes. However, we found that data produced by our analyses strongly implies the existence of a set of tumors for which gene expression is intermediate globally in large cohorts where 3 phenotypes are present in significant numbers.
We also showed that multiple types of cancers of squamous epithelial cell origin likely had high numbers of monocytes, setting them apart from other solid tumor types. Our findings suggest that T cell infiltrate is inherently dampened in these cases, and this is likely to extend to some other cohorts (e.g. squamous cell cancers of the skin). Attempts to validate a gene expression signature proven to predict a response to treatment in melanoma extended well to other tumor types, but not to squamous epithelial tumors. This implies that no universal gene signature exists for responders, suggesting that the basis for patient response to immunotherapy is a characteristic intrinsic to the immune cells themselves (by implication, it is the polarization and receptor sequence of T cells). Previous work with collaborators developed a 3gene signature in investigation of PET imaging, providing evidence to suggest that this signature has a high correlation with monocytic infiltrate. Thus, other tumor types may potentially be diagnosed through a similar radiographic technique. There is also an implication that infiltration by monocytes can influence the behavior of T cells and may even regulate their numbers. Current literature suggests the concentration of lactic acid in the tumor microenvironment is a probable factor related to this blunting of T cell responses. We also note that the 3 genes in the signature we have used as a proxy for monocyte infiltrate have a close relationship to lactic acid metabolism ,28,32 , consistent with what has been described as a "Reverse Warburg Effect." This Reverse Warburg effect has been described in multiple cancers and has been reproduced in murine models 62 .
Finally, we showed by way of this proxy measurement of monocytes that these innate immune cells are likely a primary cause of poor patient prognosis in a cohort of several thousand. This suggests that targeting the innate immune system may be a valuable means of immunotherapy in addition to current methods in use. In particular, reducing bloodborne GM-CSF may represent a feasible approach 63 to reduce monocyte production by the bone marrow. A caveat to this notion is that some tumor types appear to have very little immune infiltrate other than these monocytes to act as a defense for the host.

Data preprocessing
After download, each individual TCGA cohort was preprocessed for JAMMIT analysis by removal from the expression matrix of all genes with 75% or more entries equal to zero. Data was log-transformed according to a generalized log transformation method 64 ("glog"), followed by quantile normalization. Centering of each individual matrix row is performed by subtraction of average value from the row.

JAMMIT analysis
JAMMIT runs for these analyses were set to 1,000 permutations each. Solutions were chosen according to an approximate signature size of 1,000 genes rather than false discovery rate (FDR). This produced an FDR of less than 0.05 for all cohorts except CHOL, THYM, UCS and UVM.

UpSet plots
UpSet plots were produced using the online tool available as referenced in the text.

Ingenuity pathway analysis
Top canonical pathways were determined using IPA analysis. Singular value decomposition of matrices formed by JAMMIT solution signatures produced loading values for each gene to be uploaded to IPA for individual cohorts. Only significant pathways with p < 0.05 were used in the figures shown.

Gene signatures
The 3-gene "Warburg" signature was derived during a previous study as referenced in the text. Apart from JAMMIT solution signatures, other gene signatures used in this publication were based on referenced publications, IPA or curated manually.

Hierarchical clustering
Hierarchical clustering was produced using TreeView with Pearson's correlation of both rows and columns and complete linkage. Per-gene normalization 65 of data was employed to create heatmaps of reduced signature size (except for Fig. S1, which is based on average gene expression).

Cibersort validation
Cibersort validations were conducted with 1,000 permutations using absolute mode and a sample signature matrix file supplied at the Cibersort website among a list of examples.

Immunotherapeutic efficacy validation
After download of published data, validation of patient response to immunotherapy was performed by segregating responders to anti-PD-1 therapy and non-responders. Differential expression analysis of the panel of genes included in this analysis (including just under 800 genes of primarily immunologic expression) was conducting using a one-tailed homoscedastic Student's t-test. Transcriptomic mRNA data for significantly differentially expressed genes were then isolated from the matrices of the TCGA patient cohorts shown in Fig. 6. The submatrices thus created were analyzed using the singular value decomposition to generate scores for each sample. Rank listing of samples according to score and sectioning into quartiles based upon those scores and comparison with JAMMIT solution-derived clustering produced the results shown.

Routine statistical calculations
Results of Fisher's exact test were produced using the bioinformatic tool online at https://www.scistat.com/statisticaltests/fisher.php. Kaplan-Meier plots were produced using the online statistical tool at https://astatsa.com/LogRankTest/. Perl (Indigoperl build for Microsoft Windows, available through www.indigostar.com) was used primarily for sorting data and performing other nonmathematical operations not readily adapted to use of Excel.

Data availability
Data used in all analyses described here are available through the Genomic Data Commons as part of the TCGA project. DbGap data and other data used in validation shown in Fig. 6 are available as referenced below.

Figure 2. JAMMIT solutions isolate similar molecular features shared by different cancers. (A)
Ingenuity Pathway Analysis identified significant cellular and molecular pathways associated with loadings derived from JAMMIT solutions for TCGA mRNA databases. All pathways shown had significance of at least p < 0.05. Note that the overwhelming predominance of immune-related pathways indicates similar immune processes are common features of tumors irrespective of cancer or tissue type. (B and C) Features of Type I, Type II and Type III Immunity in JAMMIT solution-derived hierarchical clustering schemes. In each pair of heatmap panels, general immune markers are shown at the top, while lineage-specific markers of innate lymphoid cells are shown below. Differential expression occurs primarily among markers of Type I immunity and secondarily among markers of Type III immunity, but high expression is restricted on average to tumors mediated by Type I immunity.  TCGA mRNA databases. All pathways included had a threshold of significance of p < 0.05. B and C Features of Type I, Type II and Type III Immunity in JAMMIT solution-derived hierarchical clustering schemes for C THCA and D STAD. In each pair of heatmap panels, general immune markers are shown at the top, while lineage-specific markers of innate lymphoid cells are shown below. Note the similarity to Figures 2B and 2C.   gene expression of agranulocytes. The clusters are produced by hierarchical clustering of 977 samples according to a 1056-gene JAMMIT signature and show high differential expression primarily among TC1mediated and non-immunogenic samples. B The relative proportion of estimated immunophenotype is plotted as a percentage of all samples for each of 29 solid tumor types. In the case of ACC and also for CESC, no division of samples into an analogous clustering scheme was attempted.