Abstract
Glioblastomas (GBMs) are biologically heterogeneous within and between patients. Many previous attempts to characterize this heterogeneity have classified tumors according to their omics similarities. These discrete classifications have predominantly focused on characterizing malignant cells, neglecting the immune and other cell populations that are known to be present. We leverage a manifold learning algorithm to define a low-dimensional transcriptional continuum along which heterogeneous GBM samples organize. This reveals three polarized states: invasive, immune/inflammatory, and proliferative. The location of each sample along this continuum correlates with the abundance of eighteen malignant, immune, and other cell populations. We connect these cell abundances with magnetic resonance imaging and find that the relationship between contrast enhancement and tumor composition varies with patient sex and treatment status. These findings suggest that GBM transcriptional biology is a predictably constrained continuum that contains a limited spectrum of viable cell cohabitation ecologies. Since the relationships between this ecological continuum and imaging vary with patient sex and tumor treatment status, studies that integrate imaging features with tumor biology should incorporate these variables in their design.
Introduction
Glioblastoma (GBM) is the most common primary brain tumor in adults and is known to be genetically, transcriptionally, and compositionally heterogeneous within and between patients (1–3). Treatment that effectively treats one tumor may be ineffective for another, and even cells within the same sample can have different therapeutic sensitivities (3). Since patients typically undergo a single surgery, at which time malignant cells are invariably left behind due to the diffuse nature of the disease, the ability to use routine magnetic resonance imaging (MRI) to non-invasively and longitudinally characterize tumor heterogeneity is invaluable.
Several studies have sought to characterize the heterogeneity of GBM using bulk and single-cell transcriptional profiling (4–9). In general, these studies have discretized GBM into biological groups and have focused on characterizing tumor cell populations rather than non-tumor cells. These simplified models of high-dimensional data, while easily interpretable, may not fully capture the biological reality of GBM. Mounting evidence suggests that GBM diffusely infiltrates the brain and cross-talks with normal cells (10–12). These pioneering tumor cells engage their environment, interact with the immune system (which can be co-opted by the tumor to facilitate its growth and invasion), and trigger other reactive responses (e.g., reactive astrocytes, expansion of progenitor cells). Under this conceptual framework, the biology of a tumor sample is a direct reflection of its cellular constituents and the environmental variables that influence them.
MRI is the mainstay of treatment assessment for GBM, but we do not fully understand how signal intensities from these images reflect the underlying biology. Under the prevailing gold standard for the radiographic evaluation of GBM, there are four categories of tumor responsiveness: complete response, partial response, stable disease, and progressive disease (13). Assignment to these categories is determined by the changes in the contrast-enhancing (CE) and non-enhancing (NE) tumor volumes in light of patients’ steroid usage and clinical status. This paradigm is fundamentally limited by the non-specificity of contrast enhancement on T1-weighted MRI and T2/FLAIR MRI hyperintensity. This is evident with the use of anti-angiogenic agents (i.e., bevacizumab), where an acute reduction in capillary permeability leads to a decrease in intratumoral contrast enhancement without a change in tumor burden (14). Conversely, about a third of patients demonstrate an increase in T1 contrast enhancement and progressive T2/FLAIR abnormality upon the initiation of temozolomide and radiation in a phenomenon called pseudoprogression (14, 15). It is assumed that T2/FLAIR hyperintensity represents infiltrative tumor, but it could represent several unrelated entities including vasogenic edema, microvascular ischemia, and leukoencephalopathies.
In this study, we leverage a manifold learning algorithm to discern the continuous nature of GBM’s transcriptomes and the cellular compositions associated with them. We connect the resultant spectrum of tumor ecologies with MRI features in light of covariates like patient sex and tumor treatment status. In doing so, we assess the ability to infer tumor biology from routine imaging.
Methods
Patient populations and tumor biopsies
525 patients (205F, 320M) with GBM underwent tissue characterization as a part of The Cancer Genome Atlas (TCGA) effort. Microarray data from these samples were accessed using UCSC Xena (16). After obtaining informed consent, we enrolled 44 patients (18F, 26M) with primary or recurrent GBM in our IRB-approved study. Patients underwent surgical tumor resection with intraoperative navigation, and a total of 157 image-localized biopsies were harvested. The tissue was flash-frozen, and the Illumina TruSeq v2 RNAseq kit was used to prepare sequencing libraries. An Illumina HiSeq 4000 sequencer obtained paired-end reads, and FASTQ files were aligned to a reference genome (GRCh38.p37). Read counts were compiled using htseq-count, and batch effects were corrected with ComBat-Seq (17).
MRI acquisition and image localization
Pre-and post-contrast T1-weighted (T1W), pre-contrast T2-weighted (T2W), and fluid-attenuated inversion recovery (FLAIR) sequences were acquired at 3T field strength. Acquisition parameters were as follows: T1W (TR/TE=600/12ms, matrix=320×240, FOV=24cm, thickness=2mm), T2W (TR/TE=4000/76ms, matrix=384×384; FOV=22cm; thickness=3mm), and FLAIR (TI/TR/TE=2600/10030/135ms, matrix=320×320, FOV=24cm, thickness=2mm). Post-contrast T1W images (T1Gd) were acquired following administration of a gadolinium-based contrast agent. Using previously-described methods, MRI coordinates for image-localized biopsies were captured and co-registered with multimodal preoperative imaging (18). Biopsy samples from regions that were T1Gd hyperintense were labeled contrast-enhancing (CE), and those from regions that were T1Gd isointense and T2/FLAIR hyperintense were considered non-enhancing (NE).
Deconvolving cell populations from RNA sequencing
While many have employed single-cell RNA sequencing as a means of characterizing intratumoral heterogeneity, this approach notoriously undersamples non-neoplastic populations (e.g., neurons) (19). Single nucleus RNAseq (snRNAseq) circumvents this limitation by more comprehensively sampling malignant, immune, and other cells. We leveraged a snRNAseq data set (20), derived from seven patients with GBM, which contained eighteen cell types: six malignant, five immune, and seven “other.” Malignant cell populations included mesenchymal/immunoreactive (gl_Mes1 and gl_Mes2), proliferative (gl_Pro1 and gl_Pro2), and neural/oligodendrocyte precursors (gl_PN1 and gl_PN2). Immune cells were characterized as T-cells, myeloid cells present at baseline (Myel1), proliferative tumor-associated macrophages (prTAM), monocyte-derived cells (moTAM), and microglia-derived cells (mgTAM). Other populations were characterized as neurons, endothelial cells, oligodendrocytes, oligodendrocyte precursor cells (OPCs), and three subtypes of astrocytes (Ast1, protoplasmic; Ast2 and Ast3, reactive).
A bulk RNAseq transcriptome reflects the sum of transcriptional contributions from individual cells. Therefore, the cellular composition of heterogeneous samples can be estimated using deconvolution methods like CIBERSORTx (21). This method relies on marker gene input and assumes that expression signals are linearly additive, such that marker gene expression is related to the proportion of that cell type in the sample. Each cell in our snRNAseq matrix was labeled as one of eighteen cell populations (20). This served as the input to CIBERSORTx, where a signature matrix was derived to define the unique features of each cell type. This signature was applied to the bulk RNAseq samples to estimate the abundance of all eighteen cell types in our image-localized samples. The analysis was performed in absolute mode with 500 permutations and quantile mode disabled. S mode batch correction was applied to account for the application of a signature derived from UMI-based snRNAseq to bulk RNAseq data. We present CIBERSORTx output as absolute and relative (normalized to the total of its category - glioma, immune, or other) values.
Trajectory inference and pseudotime ordering of TCGA microarray and image-localized RNAseq samples
The objective of trajectory inference is to reduce high-complexity, asynchronous data into an ordered, one-dimensional path. While it was initially described in the context of ordering single-cell RNAseq data to infer the biological progression of cell lineages (22), it has been successfully repurposed to determine the relative ordering of other types of complex data (23). Monocle initially reduces data dimensionality using independent component analysis (ICA) (24). The data is represented by a series of nodes (cells) and edges (weighted by distance between samples after ICA). A manifold learning algorithm (DDRTree) uses reverse graph embedding to identify potential backbones of the trajectory and orders samples along the longest path. Each sample is assigned to a branch on the tree, and pseudotime is calculated using the geodesic distance to the path’s starting point. Here we used TCGA microarray and our biopsy RNAseq as inputs to visualize GBM’s natural transcriptional organization. States were defined as each of the three concurrent line segments. The “fgsea” package was used to perform gene set enrichment analysis, with the MSigDB database as the reference (25, 26).
Statistical analyses
The TCGA cohort contains one sample per patient and thus each sample can be considered independent. Chi-square tests were used to determine the significance of differences between groups. Since the image-localized biopsy cohort contains multiple samples per patient, linear mixed-effects models coupled with ANOVA tests (Type II Wald tests) were used to determine the significance of differences in this cohort, with cumulative plots and box plots employed for data visualization (27). For categorical outcome variables, groups were binarized. A dummy variable of patient number was included as a random effect in each test within the lmer and glmer functions in the “lme4” R package (28, 29) for continuous and categorical dependent variables, respectively. The “car” package in R was used for ANOVA tests (30). The “rmcorr” package in R was used to determine the significance of correlations between CIBERSORTx populations alongside the Bonferroni method to adjust for multiple comparisons (31).
Results
Trajectory inference of GBM samples reveals three polarized states
To discern GBM’s transcriptional continuum, a manifold learning algorithm inferred the biological trajectory of TCGA GBM microarray samples (n=525). This approach reduces data dimensionality, computes the minimum spanning tree trajectory, and orders samples along the longest path (Figure 1A). Pseudotime, the geodesic distance of a point from the trajectory’s origin (pseudotime=0), reveals an initiating trunk that bifurcates into two paths (Figure 1B). There is no sex difference in pseudotime distribution (p=0.19; Figure 1C). The three concurrent line segments of the trajectory are labeled as “states” A, B, and C (Figure 1D). Each state has a unique transcriptional profile of the top 1000 most highly variable genes (Figure 1E). Pre-ranked GSEA, using known transcriptional subtype signatures as a reference, reveals that state A is concordant with neural and proneural signatures, state B is more mesenchymal, and state C agrees with the classical signature (Figure 1F). TCGA sample classification overlaid on the trajectory confirms that each state is associated with a transcriptional subtype (X2=239, p<0.0001; Figure 1G). Relative distribution of age, IDH1 mutation status, and MGMT methylated status are shown in Figure S1A-D.
Bulk transcriptomic sequencing of 134 image-localized biopsies (median=3 per patient) from 39 patients with primary (n=30) or recurrent (n=9) GBM underwent trajectory inference. Patient demographics and biopsy characteristics are summarized in Table S1. The inferred biological trajectory of these samples again reveals three concurrent segments that recapitulate those seen with TCGA (Figures 2A-B). In this cohort, there is a bias for female samples towards lower pseudotime values (55.2 vs. 71.7, p=0.002; Figure 2C). There is no difference in IDH status between the states (X2=1.19, p=0.55; Figure S1E).
States reflect transitions in cellular population ecologies
To interpret how tumor composition contributes to this trajectory, each image-localized sample underwent RNA sequencing deconvolution to estimate the abundance of the individual cell populations (Figure S2A; Table S2). Eighteen cell populations, as defined by snRNAseq (20), were predicted: six malignant glioma, five immune (e.g., myeloid cells, T-cells), and seven “other” (e.g., astrocytes, neurons, endothelial cells). The sums of malignant cells, immune cells, and other cells are highest in states C, B, and A, respectively (Figure 2D). The trajectory as labeled by each cellular component can be found in Figure S2B.
When ordering sample composition heatmaps by pseudotime, we find that transitions between states correlate with changes in composition (Figure 2E). Ecological diversity measures (e.g. Shannon entropy and evenness) have lower values at the distal poles of each state’s segment compared to the transitional arms (Figure S2B). Taken together, these data suggest that GBM states are on a spectrum between three polarized states: invaded brain, immunoreactivity, and proliferation. Any sample can be a transitional admixture between these states by way of changes in their underlying cellular composition. As such, the abundance of certain cell populations tends to correlate with one another. Sex-specific cell abundance correlation networks are quantified in Figure 2F and overall cohabitation trends are schematized in Figure 2G.
Contrast-enhancing samples contain more malignant cells than non-enhancing samples
Since measures of contrast-enhancement (CE; from the T1Gd+ core) and non-enhancement (NE; from the T1Gd-/FLAIR+ penumbra) are central to surgical planning and the radiological assessment of treatment responsiveness, we were interested in the relationship between enhancement and tumor biology (i.e., pseudotime and cellular composition). State A samples tend to be NE, while samples from states B and C tend to be CE (p=0.00034; Figure 3A). Tumor status (i.e., primary and recurrent) has no significant relationship to states (X2=0.39, p=0.82; Figure 3A). Overall cellular composition varies widely within and between CE and NE samples but is more predictable within states (Figure 3B).
The absolute abundance of malignant cells is higher in samples harvested from CE than NE regions for each sex individually (p=0.0053 males, p=0.021 females; Figure 3C). This remains true when considering all data together (p=0.00025; Figure S3A) or separating biopsies collected from primary (p=0.037; Figure S3B) and recurrent tumors (p<0.0001; Figure S3C). This finding was also seen within males for primary biopsies alone (p=0.027; Figure S3D), with an analogous but insignificant difference in the recurrent setting (p=0.081). Within females, the result did not hold for primary samples only (p=0.65) but did for recurrent samples only (p<0.0001; Figure S3E). Amongst NE samples, there are fewer malignant cells in recurrent compared to primary biopsies (p=0.0045; Figure S3F).
To determine if specific populations are driving the relationship between contrast enhancement and malignant cell abundance, each subtype was considered individually. There was an overrepresentation of gl_Mes2 cells in CE compared to NE biopsies (p=0.0002; Figure S3G). Notably, an increased abundance of gl_Pro1 cells in all CE samples (p=0.0028; Figure S3G) was driven by female patients (p=0.00059; Figure 3D). Conversely, NE biopsies have more gl_PN1 cells than their CE counterparts (p<0.0001; Figure S3G), which was also the case within male (p=0.026; Figure 3D) and female samples (p=0.00010; Figure 3D). There were no significant regional differences in gl_Pro2 across all samples or by sex.
Some immune cells have sex-specific imaging correlates
Although GBM is known as a “cold” tumor with little lymphocytic infiltration, tumor-associated macrophages and other immune cells are abundant and can account for up to half of a sample (32, 33). Thus, we were interested in the ability of contrast enhancement to discern the presence of immune cells. We observed a difference in the total abundance of immune cells between NE and CE imaging regions for females but not males (Figure 3E).
We next considered each population individually under the assumption that the imaging representations of immune activity could be sex-dependent. Contrast enhancement had no significant relationship to Myel1 cells, T-cells, or prTAMs. In females, mgTAMs were more likely to be present in NE samples (p=0.0019; Figure 3F), with significance retained only in the recurrent subcohort (p=0.00058; Figure S4A). In both sexes combined, moTAMs were significantly more abundant in CE than NE regions (p=0.00063; Figure S4B), a relationship that was retained only in primary samples (p=0.0078; Figure S4C).
MRI contrast enhancement corresponds with the abundance of some normal brain cell populations
NE tumor regions are expected to have fewer malignant cells and more normal and/or reactive CNS populations (34). We observe more neurons in NE samples (p<0.0001; Figure S5A), the significance of which is retained in both male and female samples (p=0.00035 and p=0.0046, respectively; Figure 3H), and primary and recurrent settings (p=0.00092, Figure S5B; p=0.00024, Figure S5C, respectively). Ast1 prevalence in NE (p<0.0001; Figure S5A) is not recapitulated in male primary GBM biopsies (p=0.36) but is driven by females in the primary setting and males in the recurrent setting (p=0.019, Figure S5D; p=0.014, Figure S5E, respectively). Endothelial cells, which line the neovasculature from which contrast extravasates into tissue (35), were more abundant in CE compared to NE samples (p<0.0001; Figure S5A). This significance was retained in primary (p=0.00011; Figure S5B) and recurrent samples (p=0.018; Figure S5B). There were no cohort-wide significant relationships between imaging and Ast3, oligodendrocytes, or OPCs.
Overall tumor composition has little bearing on MRI contrast enhancement
To reflect on the connections between contrast enhancement, pseudotime state, and cellular composition, we share a case example. A 59-year-old female with primary IDH wild-type, MGMT methylated GBM underwent seven image-localized biopsies (Figure 4A; 5CE, 2NE). The biopsies are distributed across the pseudotime trajectory and all three states (Figure 4B). Inter-sample correlations of cellular composition, as enumerated by CIBERSORTx, revealed a spectrum of biological heterogeneity between the samples (Figure 4C). Biopsies with similar imaging appearance (i.e., both NE or both CE) were associated with a spectrum of cellular composition similarity (Figure 4D). Samples 2 and 3 have nearly identical cell compositions (R=0.95), but one is CE and one is NE. Conversely, samples 4 and 5 are compositionally dissimilar (R=0.07) and belong to different pseudotime states (B and C), yet they are both CE.
Discussion
An unsupervised trajectory inference algorithm revealed that the GBM transcriptome is a continuum of three polarized states: invasive, immune/inflammatory, and proliferative. The states are conserved regardless of patient sex and tumor treatment status. The trajectory’s constrained nature may have implications for treatment, as transcriptional dynamics should be predictably modifiable as a function of therapy and one should theoretically be able to artificially transmute the tumor into a more favorable (less aggressive) state. This emboldens the potential for adaptive therapy schemas (36) which are increasingly focused away from cytotoxic treatments and towards cytostatic population control approaches.
We next employed a cellular deconvolution strategy to predict the abundance of eighteen cell types amongst these states. Previous efforts to characterize GBM’s cellular composition in totality have been impeded by the limitations of single-cell RNA sequencing (e.g., under-representation of cell populations) (19). Recent methodological advancements, (i.e., single nucleus RNA sequencing) circumvent these limitations and allow for a more complete characterization of all populations including malignant, immune, and other cells. We leveraged a unique cohort of GBM snRNAseq as training data for a cellular deconvolution algorithm (20). This paradigm allowed us to quantify all GBM cell subpopulations in a large number of bulk RNA sequencing samples, which has never been done before. In connecting cell abundances to the three tumor states, we observe that transitions between states directly correlate with changes in population ecologies. Many prior efforts to classify transcriptomics data have underappreciated the continuity of these low-dimensional spaces.
The widespread clinical utility of these results hinges on the ability to integrate them with routine clinical imaging. Several factors have historically hindered our ability to understand the relationships between tumor biology and MRI. First, since GBM has such profound intratumoral heterogeneity, biological data from tissue biopsies and corresponding imaging features cannot be aligned without knowing the exact location from which the sample was harvested. Since image-localized biopsy collection is a resource-intensive process, the vast majority of GBM samples are non-localized. Second, GBM has known sex differences in incidence, treatment responsiveness, and prognosis (37, 38). For this reason, its biological states, and the imaging representations associated with them, cannot be assumed to be the same for males and females. However, across all of biomedicine, few studies have considered sex differences in their design (39). Despite our lack of biological understanding of what imaging means due to these limitations, serial MRIs are paramount in the evaluation of GBM and many assumptions are made in their interpretation.
To address these limitations, we leveraged image-localized biopsies to directly connect MRI features with transcriptional states and cellular composition. A sample’s biological state is generally associated with measures of contrast enhancement or non-enhancement. Samples from regions of contrast enhancement had higher abundances of malignant cell populations, whereas non-enhancing samples generally contained more normal brain cells. These findings in isolation support the traditional notion that CE and NE represent proliferative tumor and invaded brain, respectively. However, an assessment of cellular composition as it relates to imaging and pseudotime revealed a more nuanced story.
Patterns of cellular cohabitation could most simply be described as a function of the pseudotime trajectory. However, these ecological patterns, as well as their relationships to CE and NE, varied based on patient sex and tumor treatment status. Notably, the polarization of the malignant gl_Pro1 population towards CE regions and gl_PN1 population towards NE regions was driven primarily by female samples. Further, enrichment for the Myel1 population was spatially skewed towards NE for males and CE for females. Taken together, these findings support the potential for sex-distinct imaging patterns of tumor ecologies.
Of particular clinical relevance, we presented a case example of a patient who underwent seven biopsies from both CE and NE regions (Figure 4). The cellular composition of these samples had very little bearing on contrast enhancement status. This result highlights the limitations of current methods of treatment assessment, where a tumor is considered to be responsive if the CE volume regresses. The inability of contrast enhancement alone to fully predict cellular composition motivates a role for advanced imaging strategies like radiomics. Radiomics is a quantitative approach to predict biological attributes from complex imaging features (40, 41). Early radiomics efforts focused on connecting an entire image to a biological prediction, thus failing to capture the spatial heterogeneity within individual tumors. More recently, our group and others have leveraged image-localized biopsies to directly connect spatially-resolved imaging features with tissue biology (42–48). The differences in bioimaging relationships seen in the primary and recurrent setting as well as between the sexes embolden the careful incorporation of these variables in the context of radiomics. Elucidating these bioimaging relationships will provide the opportunity to non-invasively characterize tumors in their entirety and personalize each patient’s therapy to exploit the unique weaknesses of their tumor’s dominant state.
Acknowledgments
We are grateful to all of those who have contributed to elements of this work, particularly current and past members of the image analysis teams and the glioma biopsy protocol teams, including: Barrett Anderies, Jessica Bauer, Spencer Bayless, Hend Bcharach, Regina Becker, Sameer Channer, Brenden Doyle, Lysette Elsner, Lily Esaleh, Ashlyn Gonzales, Crystal Harris, Morgan Hatlestead, Ryan Hess, Sandra Johnston, Yvette Lassiter-Morris, Julia Lorence, Ashley Napier, Ashley Nespodzany, Lisa Paulson, Cassandra Rickertsen, Sejal Shanbhag, Sarah Van Dijk, and Scott Whitmire.