Abstract
Before squamous cell lung cancer develops, pre-cancerous lesions can be found in the airways. From longitudinal monitoring, we know that only half of such lesions become cancer, whereas a third spontaneously regress. While recent studies have described the presence of an active immune response in high-grade lesions, the mechanisms underpinning clinical regression of pre-cancerous lesions remain unknown. Here, we show that host immune surveillance is strongly implicated in lesion regression. Using bronchoscopic biopsies from human subjects, we find that regressive carcinoma in-situ lesions harbour more infiltrating immune cells than those that progress to cancer. Moreover, molecular profiling of these lesions identifies potential immune escape mechanisms specifically in those that progress to cancer: antigen presentation is impaired by genomic and epigenetic changes, TGF-beta signalling is overactive, and the immunomodulator TNFSF9 is downregulated. Changes appear intrinsic to the CIS lesions as the adjacent stroma of progressive and regressive lesions are transcriptomically similar. This study identifies mechanisms by which pre-cancerous lesions evade immune detection during the earliest stages of carcinogenesis and forms a basis for new therapeutic strategies that treat or prevent early stage lung cancer.
Before the development of lung squamous cell carcinoma (LUSC), pre-invasive lesions can be observed in the airways. These evolve stepwise, progressing through mild and moderate dysplasia (low-grade lesions) to severe dysplasia and carcinoma in-situ (CIS; high-grade lesions), before the development of invasive cancer(1). Markers of immune sensing and escape have been associated with increasing grade(2). However, longitudinal bronchoscopic surveillance of such lesions has shown that progression of pre-invasive lesions to cancer is not inevitable; only half of high-grade CIS lesions will progress to cancer within two years, whereas a third will spontaneously regress(3). Here, we integrate genomic, transcriptomic, epigenetic and imaging data across carefully phenotyped airway CIS lesions and adjacent stroma (Table S1; Extended Data Figure 1) to assess the role of immune surveillance in lesion regression. We identify key immune escape mechanisms enriched in pre-invasive lesions which later progressed to cancer. Understanding these mechanisms may offer new therapeutic strategies to induce regression and prevent the development of invasive disease.
Summary of analyses performed on each CIS sample. Due to technical limitations related to the small size of bronchoscopic biopsies, not all analyses were performed on all samples. Table S1 provides a detailed reference of analyses performed on a per-sample basis. Methodology for sample selection for each analysis modality is provided in methods.
To assess our hypothesis that lesion regression is driven by immune surveillance, we first performed immunohistochemistry (IHC) on 28 progressive and 16 regressive CIS lesions (Figure 1a-b). Regressive lesions showed higher concentrations of intra-lesional cytotoxic CD8+ (p=0.037; Figure 1c) but not CD4+ (p=0.25) or regulatory FOXP3+ (p=0.41) T cells. We then quantified immune cells in stromal regions adjacent to CIS lesions, but found no significant differences between progressive and regressive lesions for CD8+ (p=0.49), CD4+ (p=0.43) or FOXP3+ (p=0.64) cells. We then used a machine-learning approach to quantify lymphocytes from hematoxylin and eosin (H&E) stained slides in a much larger dataset of 113 samples, which similarly contained more infiltrating lymphocytes in regressive lesions (Figure 1c; p=0.023).
Immune cell infiltration of lung carcinoma-in-situ lesions. (a-b) Immunohistochemistry images of (a) progressive CIS lesion and (b) regressive CIS lesion with CD4+ cells stained in brown, CD8+ cells in red and FOXP3+ in blue. Immune cells are separately quantified within the CIS lesion and in the surrounding stroma. c) Combined quantitative immunohistochemistry data of CD4, CD8 and FOXP3 staining (n=44; 28 progressive, 16 regressive) with total lymphocyte quantification from H&E images (n=116; 69 progressive, 47 regressive) shown. We observe increased lymphocytes (p=0.023) and CD8+ cells (p=0.037) per unit area of epithelium within regressive CIS lesions compared to progressive. Stromal regions adjacent to CIS lesions showed no significant differences in immune cells between progressive and regressive lesions. p-values are calculated using linear mixed effects models to account for samples from the same patient; *<0.05.
For a broader assessment of transcriptomic differences between CIS lesions and their adjacent stroma, we isolated epithelial tissue and paired stroma separately using laser capture microdissection for 10 progressive and 8 regressive CIS lesions. Similarly to IHC data, cell type deconvolution analysis demonstrated higher infiltrating lymphocytes in regressive lesions (Figure 2a; p=0.0012), as did deconvolution of methylation data from 36 progressive and 18 regressive CIS lesions (Figure 2b; p=0.006). Comparing predictions for individual cell types across gene expression and methylation data found an increase in most immune cell types in regressive lesions compared to progressive, with the exception of macrophages – a potentially immunosuppressive cell type – which were more abundant in progressive lesions (p=0.005; Table S2).
Identification of immune ‘hot’ and ‘cold’ carcinoma in-situ lesions by immune cell clustering. Progressive and regressive lesions have significantly different geneexpression derived TIL scores (a; p=0.0012) and different immune cell percentages as derived from methylCIBERSORT (b; p=0.006). c) Immune cell quantification from gene expression data (n=18) using the method of Danaher et al. shows an ‘immune cold’ cluster (left) in which all lesions progressed to cancer, and an ‘immune hot’ cluster (right) in which the majority regressed. d) Similar clustering on methylation-derived cell subtypes using methylCIBERSORT (n=54) again shows two distinct clusters: an ‘immune cold’ cluster (left) dominated by a cancer cell signature, in which all but one lesion progressed, and an ‘immune hot’ cluster (right), containing both progressive and regressive samples. p-values are calculated using mixed effects models to account for samples from the same patient.
Analysis of proand anti-inflammatory cytokine expression within the epithelial compartment demonstrated an increase in pro-inflammatory (p=1.2×10−5) but not anti-inflammatory (p=0.3) response in regressive lesions compared to progressive (Extended Data Figure 2). IFNG, IL2 and TNF were all increased in regressive lesions (Extended Data Figure 3). IL10 was also increased in regressive lesions; whilst classically considered an anti-inflammatory cytokine, IL10 has been shown to stimulate anti-tumor immunity(4). Only CXCL8 was upregulated in progressive samples compared to regressive (p=1.8×10−5); produced by macrophages, the expression of CXCL8 correlated strongly with macrophage quantification from deconvoluted gene expression data (r2=0.62, p=0.007). Taken together, these data are in keeping with a model in which inflammation via IFN-γ, IL-2 and TNF fosters effective immune surveillance, whilst lesion-associated macrophages – similar to tumor-associated macrophages in advanced cancers – have an immunosuppressive effect.
Comparing transcriptomic data from progressive CIS lesions (n=10) with regressive (n=8) we find regressive lesions express higher levels of proinflammatory cytokines (a) but not anti-inflammatory cytokines (b) within the epithelium. The pro:anti-inflammatory ratio is higher in regressive lesions (c). Transcriptomic data from laser-captured stroma adjacent to the same lesions does not show any difference in cytokine expression between progressive and regressive lesions (d-f). Expression values shown are the geometric means of gene expression data for 9 pro-inflammatory and 7 anti-inflammatory cytokines. p-values are calculated using linear mixed effects modelling to account for samples from the same patient.
Expression of individual cytokines in progressive and regressive CIS lesions. Continuing the analysis of transcriptomic data from progressive CIS lesions (n=10) with regressive (n=8) shown in Extended Data Figure 2, we demonstrate the contributions of individual pro-inflammatory cytokines (a) and anti-inflammatory cytokines (b). We see upregulation of several pro-inflammatory cytokines in regressive lesions: IFNG, IL12A, IL2, IL23A and TNF, as well as the classically anti-inflammatory cytokine IL10. CXCL8, which is associated with macrophages, is downregulated in regressive lesions. p-values are calculated using linear mixed effects modeling to account for samples from the same patient.
Recent advances have demonstrated heterogeneity of lung cancer immune infiltration, with patients whose tumors have more infiltrated ‘immune hot’ regions having improved survival as compared to those with abundant poorly infiltrated, ‘immune cold’ regions(5, 6). Hierarchical clustering of deconvoluted immune cell quantification at both the transcriptomic and epigenetic levels demonstrated clear clusters of ‘cold’ lesions, almost all of which progressed to cancer (Figure 2c-d). However, we also observed some ‘hot’ progressive lesions, suggesting the presence of other mechanisms in these lesions. We therefore sought to address two questions: firstly, could deficits in antigen presentation and immune recruitment in progressive lesions be identified, which could explain the observed ‘cold’ lesions? Secondly, could disordered immune cell function explain the existence of progressive immune ‘hot’ lesions?
The acquisition of mutations that result in clonal neoantigens drives T cell immunoreactivity in cancer(7). We hypothesised that immune-active regressive lesions may contain more neoantigens than progressive lesions, however, this was not supported by whole-genome sequencing data(8) (n=39). Predicted neoantigens correlated very closely with mutational burden (r2=0.94), and progressive lesions have been shown to have significantly higher mutational burden than regressive lesions(8), therefore more neoantigens were identified in progressive than regressive lesions (p=0.077; Extended Data Figure 4a-b). This remained true when the analysis was limited to clonal neoantigens (p=0.034) and there was no difference in the proportion of neoantigens that were clonal (p=0.24) (Extended Data Figure 4c-d). Further, the ratio of observed to expected neoantigens was not different (p=0.94) and there were no significant differences in binding affinity (p=0.45) or differential agretopicity index (p=0.58; Extended Data Figure 4e-h), therefore the putative neoantigens themselves were not qualitatively different in the regressive group. The increased number of neoantigens identified in progressive lesions suggests that immune escape mechanisms must be active in these lesions; indeed, these antigens may act as a selection pressure to promote the development of immune escape(9). Importantly, no overlap in tumor neoantigens was observed between different patients suggesting that vaccinebased approaches aiming to prevent progression will most likely need to be designed on a personalised basis.
Neoantigen analysis of progressive versus regressive lesions. Predicted neoantigen load correlates closely with mutational burden (a). Therefore, progressive samples, which harbor more mutations, have more neoantigens (b). This remains true when the analysis is limited to clonal neoantigens (c). The proportion of clonal neoantigens was similar (d). Considering the individual predicted neoantigens, there was no qualitative difference between progressive and regressive samples; they were similar in terms of binding affinity (e), rank binding affinity (f) and differential agretopicity index (DAI) (g). The ratio of observed to expected neoantigens (‘depletion score’) was similar between progressive and regressive lesions (h). The p-value for figure (a) was calculated using Pearson’s product moment; p-values for figures (b)-(h) were calculated using a Wilcoxon rank-sum test.
Given that neoantigens are present in progressive lesions, we assessed the ability of these lesions to present antigens to the immune system. Genomic, epigenetic and transcriptomic aberrations in genes involved in MHC Class I antigen presentation (Table S3) were more prevalent in progressive than regressive lesions (p=3.9×10−6; Figure 3; Table S4). Considering only genomic aberrations, these were more prevalent in progressive lesions (p=0.0009) and this remained true after correcting for overall mutational burden (p=0.01), suggesting that these mutations may be under positive selection. At least one genomic aberration in MHC-associated genes was found in 25/29 progressive lesions (86%) and 5/10 regressive lesions (50%); progressive lesions had a median of 6 such changes whereas regressive lesions had a median of 0.5. Loss of heterozygosity (LOH) in the HLA region, which is found in 61% of LUSC patients(10), was identified in 34% of patients with CIS lesions. Interestingly, a similar proportion of LUSC patients (28%) demonstrated clonal HLA LOH, suggesting that such clonal events occur before tumor invasion. We did not find a statistically significant difference in the prevalence of HLA LOH between progressive and regressive lesions (p=0.43) although numbers were small. Expression of HLA-A was reduced in progressive compared to regressive lesions (p=1.9×10−10).
Genomic, epigenetic and transcriptomic aberrations affecting antigen-presenting genes in lung carcinoma in-situ lesions. All samples are shown (n=78; 50 progressive, 28 regressive). For each gene involved in the MHC class I pathway, aberrations are shown in transcriptomic, epigenetic and genomic data in the top, middle and bottom rows, respectively. Three genes without any identified aberrations are excluded (CNX, HSPA, HSPC). Samples without data for a particular modality are marked in white. The bar chart shows the number of aberrations in these genes (orange) as a proportion of the total number of possible aberrations (grey), based on the number of profiling modalities performed on each sample. Transcriptomic over/underexpression is defined as a z-score greater than ±2. Similarly, for methylation, hyper/hypomethylation is defined as z-score calculated for mean methylation beta value across the gene greater than ±2. All samples with a genomic aberration passing filters are highlighted; low-impact mutations are excluded. LOH calls integrate data from ASCAT and LOHHLA. Using a mixed-effects model to account for samples from the same patient, aberrations in this pathway are more common in progressive than regressive lesions (p=3.9×10−6). Considering only genomic aberrations, these were more prevalent in progressive lesions (p=0.0009) and this remained true after correcting for overall mutational burden (p=0.01).
Additionally, hypermethylation of the HLA region, which is well-described in invasive cancers(11, 12), was commonly observed, suggesting that epigenetic HLA silencing may be an important immune escape mechanism in pre-invasive disease. Genome-wide methylation analysis identified differentially methylated regions (DMRs) including a striking cluster of hypermethylation in chromosome 6 ((8); Extended Data Figure 5), covering a region containing all of the major HLA genes. This cluster was also identified in analysis of 370 LUSC versus 42 control samples published by the Cancer Genome Atlas(13). Further analysis of TCGA data demonstrate strong evidence for epigenetic silencing of multiple genes in the antigen presentation pathway: mean methylation beta value over the gene is inversely correlated with expression for HLA-A (r2=−0.32, p=2.5×10−10), HLA-B (r2=−0.42, <2.2×10−16), HLA-C (r2=−0.18, p=3.6×10−4), TAP1 (r2=−0.53, <2.2×10−16) and B2M (r2=−0.38, p=1.1×10−14). Similar trends were observed in CIS data (Extended Data Figure 6). The methylation pattern affecting these genes is predominantly promoter hypermethylation (Extended Data Figure 7).
Aberrant methylation of the HLA region is a feature of progressive CIS and cancer. (a) Differentially methylated regions across the genome, calculated for progressive vs regressive CIS (outer circle) and for cancer vs control (inner circle). Hypermethylated DMRs are plotted in yellow, hypomethylated in blue. Genes involved in the MHC class I mechanism are highlighted. In both comparisons a cluster is observed on chromosome 6, which includes all main HLA regions. (b) Selection of three probes covering the HLA-A gene, all showing marked hypermethylation in a subset of progressive samples and hence suggesting an epigenetic mechanism for reduced HLA-A in these samples.
Epigenetic silencing of antigen-presenting genes in squamous cell lung cancers. (a-f) Correlations of expression and methylation data from TCGA for key antigen-presentation genes demonstrates clear evidence of epigenetic silencing. Silencing is also seen for other cancer-associated genes such as WNT5A (g), suggesting that demethylating agents may have wider benefits than improving antigen presentation. However, some key immune genes including immunomodulatory molecule TNFSF9 (h) and MHC II regulator CIITA (i) show a positive correlation with methylation, suggesting that demethylating agents may not be universally beneficial on the immune response. Correlation coefficients shown are calculated using Pearson’s product moment.
Methylation patterns over antigen-presenting genes. Methylation patterns are shown for antigen presentation genes HLA-A, HLA-B, HLA-C, TAP1 and B2M, as well as the immunomodulator TNFSF9. Methylation data is generated from Illumina 450k microarrays, which measure methylation at 450,000 probes across the genome. In each plot, the x-axis shows the genomic location of each probe related to the gene of interest. On the y-axis, probe values are shown for each sample, coloured as progressive (red; n=36), regressive (green; n=18) or control (blue; n=33). Loess lines for each sample group are shown, with error bars in grey. We see a pattern of promoter hypermethylation in progressive samples for the majority of these genes, consistent with epigenetic silencing. An exception is TNFSF9 which shows predominantly body hypermethylation; this is consistent with the observation that hypermethylation of TNFSF9 increases expression.
Demethylating agents have been shown to promote immune activation through improved antigen presentation, immune migration and T cell activity(14–16). These data support the case for moving on-going trials of demethylating agents in combination with immunotherapy from advanced lung cancer(17, 18) into early disease. Additionally, several other cancer-associated pathways are known to be affected by methylation changes(8), therefore the benefits of these drugs may extend beyond immune activation. Nevertheless, we note with caution that some key immune genes demonstrate positive correlations in TCGA data between gene expression and methylation, including the immune co-stimulating ligand TNFSF9 (coding for 4-1BBL) (r2=0.32, p=1.7×10−10) and the MHC class II transcriptional activator CIITA (r2=0.39, p=2.5×10−15) (Extended Data Figure 6). Further studies will be required to demonstrate that immunological benefits of demethylating agents are not outweighed by effects on these important pathways.
Despite this evidence for impairment of antigen presentation mechanisms in CIS, we do observe ‘immune hot’ CIS lesions which progress to cancer. Next, we considered functional and microenvironment-related mechanisms to explain how these lesions were able to evade immune predation.
To study microenvironment effects on the immune response, we performed gene expression profiling on laser-captured stromal tissue taken from regions adjacent to CIS lesions. In contrast to data from gastrointestinal pre-invasive lesions(19), no genes were significantly differentially expressed on comparing stromal expression between progressive (n=10) and regressive (n=8) lesions when a FDR of <0.1 was applied. This result holds true with restricted hypothesis testing considering only genes that are related to immunity and inflammation (Figure 4a-b; Table S3).
Recent studies have identified TGF-beta signaling as a cause of T cell exclusion from tumors(20, 21), and as a potential therapeutic target(22). Whilst TGF-beta is variably expressed between progressive and regressive samples, the common downstream mediator SMAD4 is upregulated in progressive lesions, both in CIS tissue (p=0.023) and adjacent stroma (p=0.003; Figure 4c), potentially indicating increased TGFbeta signaling in progressive lesions. Supportive of this concept, we also observed an inverse correlation between increased stromal expression of a published fibroblast TGF-beta response (FTGFB) signature(22) and TIL gradient, defined here as (TIL score in tissue) – (TIL score in stroma) (r2=−0.66; p=0.0029; Figure 4d). We therefore propose TGF-beta driven T cell sequestration as an additional immune escape mechanism in a subset of progressive cases. Additionally, we found upregulation of epithelial-mesenchymal transition (EMT)-related genes(23), specifically those annotated as oncogenes or with dual oncogene/tumor suppressor roles in progressive samples (Figure 4e). EMT gene expression correlated with the FTGFB signature (Figure 4f), suggesting that the immune evasion role of TGF-beta may be mediated via dysfunctional EMT transcriptional signaling affecting the tumor microenvironment, as has been previously suggested(24).
Immune escape mechanisms in CIS beyond antigen presentation. (a) Volcano plot of gene expression differential analysis of laser-captured stroma comparing progressive (n=10) and regressive (n=8) CIS samples. No genes were significant with FDR < 0.05 following adjustment for multiple testing. (b) Principle Component Analysis plot of the same 18 CIS samples, showing laser-captured epithelium and matched stroma. (c) TGF-beta signaling is increased in progressive samples, as evidenced by increased expression of the downstream gene SMAD4 (p=0.02) and of a fibroblast TGF-beta response (FTGFBR) signature measured in matched stroma (p=0.05). The FTGFB signature, as a proxy for TGF-beta signaling, correlates inversely with TIL gradient, defined as tissue TIL score – stromal TIL score (d; r=−0.66, p=0.003). (e) EMT genes are upregulated in progressive and regressive samples. Specifically, we see upregulation of genes annotated as oncogenes (p=2.4×10−5) and dual oncogene/tumour suppressor functions (p=2.6×10−5) but not tumour suppressor genes (p=0.62). In each case we compare the geometric mean of genes in a published gene set for each sample. (f) Expression of EMT genes correlates well with the FTGFB signature (r=0.49, p=0.04). (g-h) On differential analysis of 28 immunomodulatory molecules, only TNFSF9 was significantly upregulated (FDR 4.3×10−5). There was no corresponding upregulation of the TNFRSF9 receptor. A comparison of ligand:receptor ratios for known cytokines identified only CCL27:CCR10 as upregulated in progressive samples (FDR 0.003). All p-values are calculated using linear mixed effects modeling to account for samples from the same patient; ***p < 0.001 **p < 0.01 *<0.05 #<0.1. Units for gene expression figures represent normalised microarray intensity values.
To identify differences in cytokine responses between progressive and regressive lesions, we calculated the lig- and:receptor mRNA expression ratio for 52 known cytokine:receptor pairs(25). Only one, CCL27:CCR10, was significant with FDR < 0.01 (Fold change 1.55, FDR 0.003); progressive samples express more CCL27 (p=2.6×10−6) and less CCR10 (p=0.1×10−4) than regressive (Figure 4g-h). CCL27:CCR10 signaling has been associated with immune escape in melanoma through PIK/Akt activation in a mouse model(26); in CIS, CCL27 expression correlates with expression of both PIK3CA (r2=0.61, p=0.008) and AKT1 (r2=0.68, p=0.002) (Extended Data Figure 8). CCL27 is minimally expressed in both normal lung tissue and invasive squamous cell lung cancer(13, 27), suggesting that this effect is specific to early carcinogenesis and therefore warrants further investigation as a target for preventative therapy.
The CCL27:CCR10 axis is upregulated in progressive samples and correlates with PIK/AKT expression. We compared ligand:receptor expression for each of 52 known cytokine:receptor pairs in 18 CIS lesions (n=10 progressive, 8 regressive). Only CCL27:CCR10 was significantly different between progressive and regressive lesions (FDR 0.003; Figure 4). Progressive samples showed upregulated CCL27 and downregulated CCR10. CCL27 activation of CCR10 has been shown to promote immune escape in mouse models, with the PIK/Akt pathway implicated as a potential mechanism. In CIS data, CCL27 expression correlates with expression of both PIK3CA (a) and AKT1 (b). Correlation coefficients are calculated using Pearson’s product moment.
Targeting immunomodulatory molecules such as PD-1 now forms part of first-line lung cancer management(28). To investigate the role of such molecules in pre-invasive immune escape, we performed differential expression analysis between progressive and regressive lesions, focused on 28 known immunomodulatory genes (Table S3). TN-FSF9 (4-1BBL, CD137L) was significantly downregulated in progressive lesions (FDR=4.34×10−5; Figure 4g-h) with no corresponding change identified in its receptor TNFRSF9 (FDR=0.6). TNFSF9 promotes activation of T cells and natural killer (NK) cells(29); in CIS lesions TNFSF9 expression correlates with cytotoxic cell (r2=0.77, p=0.0002) and NK cell infiltration (r2=0.54, p=0.02), as predicted from gene expression data. Agonists of the TNFSF9 receptor have been shown to be clinically efficacious in several cancers(30–32) and these data support their investigation in targeted early lung cancer cohorts. Furthermore, individual lesions showed notably high or low expression of other immunomodulatory genes, raising the possibility that other immunomodulators may be targets for therapy in individual cases (Extended Data Figure 9).
Comparisons of immune checkpoint molecules between progressive and regressive CIS samples. Here we show gene expression values of immune checkpoint molecules for each individual CIS lesion, showing both progressive (red; n=10) and regressive (blue; n=8). Although only TNFSF9 reaches a significance threshold of FDR < 0.05 on differential expression analysis, other genes show outlier samples in the progressive group. Defects in these genes may be a critical immune escape mechanism in these outlier samples.
Our previous work highlighted occasional cases of ‘late progressive’ lesions, which met a clinical endpoint of regression (defined by the subsequent biopsy at the same site showing resolution to normal epithelium or low-grade dysplasia) but the index CIS biopsy had the molecular appearance of a progressive lesion, and it indeed subsequently developed cancer months or years later. Clinical review identified 11 lesions across the 53 regressive lesions in our current cohort (20.7%) that at later clinical follow up subsequently progressed to cancer, and hence are termed ‘late progressive’. These included 4 previously published lesions subjected to whole-genome sequencing and/or methylation and shown to display the genomically unstable appearance of progressive lesions, as well as 7 with immunohistochemistry data and 10 with lymphocyte quantification performed from H&E slides (Table S1; Extended Data Figure 1). Interestingly, based on these data, late progressive lesions appear immunologically similar to regressive lesions, showing increased infiltration with lymphocytes and CD8 cells compared to progressive lesions (Extended Data Figure 10).
Of 53 lesions that met the clinical endpoint for regression – defined as a subsequent biopsy showing normal epithelium or low-grade dysplasia – 11 developed cancer later at the same site. These are termed ‘late progressive’ lesions. Combined quantitative immunohistochemistry data (n=44; 28 progressive, 16 regressive) with lymphocyte quantification from H&E images (n=116; 69 progressive, 47 regressive) are shown. We observe a similar trend of increased lymphocytes (p=0.06) and CD8+ cells (p=0.08) in regressive and late progressive samples compared to progressive. We also observe increased stromal lymphocytes in the late progressive group (p=0.02). Quoted p-values are calculated using ANOVA to reject the null hypothesis that all groups are equal, based on a linear mixed model to correct for multiple samples per patient; *<0.05, #<0.1. Post-hoc pairwise comparisons using a Tukey HSD test were performed but sample size was insufficient to show significant results.
Whilst we acknowledge that sample numbers are small when examining subgroups of regressive lesions in this way, our data support a model in which lesions should be considered on two axes: genomic stability and immune competence. Our previous work predicts that chromosomally unstable lesions will usually progress, implying that they have escaped immune predation. Yet some may regress if they remain immune competent only to later progress, potentially due to their genomic instability making them more likely to evolve immune escape mechanisms during regression, and hence become ‘late progressors’. Of 11 late progressors in this co-hort, median time from regressive index biopsy to progression was 3.2 years (range 0.8-4.6 years). This time period represents a change from a point of known immune competence to demonstrated immune escape. Hence, we might estimate that a successful therapeutic strategy to block a particular immune escape mechanism might delay the onset of cancer by around 3 years. Of the remaining 42 regressive samples in this cohort, median follow-up time was 4.73 years (range 0.42-13.5 years), suggesting that genomically ‘stable’ samples are likely to regress and remain regressed long-term. Given their immunological competence, late progressors are included in the regressive cohort when analysing immune escape mechanisms in this study.
In summary, we present evidence that immune surveillance may play a critical role in spontaneous regression of pre-cancerous lesions of the airways. We identify mechanisms of immune escape present before the point of cancer invasion, many of which offer potential therapeutic targets. Analysis of ‘late progressive’ samples provides insight into the dynamics of this process. These data present an opportunity to induce regression and prevent cancer development. Demethylating agents, 4-1BB agonists, CCL27 and TGF-beta blockade are therapeutic candidates that warrant further research. As a result of field carcinogenesis, patients with pre-invasive lesions are at risk of synchronous cancers at other sites, which are likely to be clonally related(8, 33) and therefore may benefit from systemic immunomodulatory treatment. The data presented here support a new paradigm of personalised immunebased systemic therapy in early disease.
Bibliography
Methods
Ethical approval
All tissue and bronchial brushing samples were obtained under written informed patient consent and were fully anonymized. Study approval was provided by the UCL/UCLH Local Ethics Committee (REC references 06/Q0505/12 and 01/0148). All relevant ethical regulations were followed.
Cohort description and patient characteristics
For over 20 years, patients presenting with pre-invasive lesions, which are precursors of squamous cell lung cancer (LUSC), have been referred to the UCLH Surveillance Study. As previously described(1), patients undergo repeat bronchoscopy every four months, with definitive treatment performed only on detection of invasive cancer. Autofluorescence bronchoscopy is used to ensure the same anatomical site is biopsied at each time point. Gene expression, methylation and whole genome sequencing data of carcinoma in-situ (CIS) samples have been performed on this cohort, and data have been published(2). These data are used in this study.
All patients enrolled in the UCLH Surveillance Study who met a clinical end point of progression or regression were included; by definition they underwent an ‘index’ CIS biopsy followed by a diagnostic cancer biopsy (progression) or a normal/low-grade biopsy (regression) four months later. Index lesions were identified between 1999 and 2017. Cases meeting an end-point of regression underwent clinical review to identify those which subsequently progressed; 11 samples (20.7%) were identified, which are described as ‘late progressors’ in the main text. Of these 11, median time from ‘regressive’ index biopsy to progression was 3.2 years (range 0.8-4.6 years) whilst the remaining 42 samples had a median follow up time of 4.73 years (range 0.42-13.5 years). Whilst we cannot fully exclude that any regressive sample may later develop cancer, the fact that median follow up in the study group was longer than the maximum follow up in the late progression group suggests that late progression in included samples is unlikely.
All samples underwent laser capture microdissection (LCM) to ensure only CIS cells underwent molecular profiling. Methods for sample acquisition, quality control and mutation calling are as previously described, as are full details regarding patient clinical characteristics.
Briefly, gene expression profiling was performed using both Illumina and Affymetrix microarray platforms. Normalisation was performed using proprietary Illumina software and the RMA method of the affy(3) Bioconductor package respectively. This study includes 18 previously unpublished gene expression arrays from stromal tissue. These samples were collected using LCM to identify stromal regions adjacent to 18 already-published CIS samples (corresponding to the 18 samples undergoing Affymetrix microarray profiling described above). These new stromal samples underwent Affymetrix profiling using the exact same methodology as previously described for CIS tissue samples. To avoid issues related to batch effects between platforms, the analyses in this paper utilise only samples profiled on Affymetrix microarrays, which include both CIS and matched stromal samples. Methylation profiling was performed using the Illumina HumanMethylation450k microarray platform. All data processing was performed using the ChAMP Bioconductor package(4).
For both gene expression and methylation data, z-scores were used to identify significant aberrations. These were calculated using regressive samples as a reference cohort for gene expression data, and control brushings for methylation data. Whole genome sequencing data was obtained using the Illumina HiSeq X Ten system. A minimum sequencing depth of 40x was required. BWA-MEM was used to align data to the human genome (NCBI build 37). Unmapped reads and PCR duplicates were remoted. Substitutions, insertions-deletions, copy number aberrations and structural rearrangements were called using CaVEMan(5), Pindel(6, 7), ASCAT(8) and Brass(9) respectively.
Comparison of Microarray Platforms
As described above, our previous work performed gene expression profiling using both Illumina and Affymetrix microarray platforms (GEO platform IDs GPL13534 and GPL18281 respectively), with Illumina data used for discovery analysis and Affymetrix as a validation set. Our previous publication did not identify clear differences in immune pathways between progressive and regressive lesions based on the Illumina discovery set, yet a similar analysis of the Affymetrix dataset does identify two significant immune-related KEGG pathways(10): cytokine-cytokine interaction (hsa04060) and type I diabetes mellitus (hsa04940). We therefore questioned whether this disparity may be due to platform differences. The Affymetrix platform used has many more probes than the Illumina platform, allowing coverage of more genes and coverage of multiple transcripts for some genes. To examine the impact of these differences we performed pathway analysis on the Illumina and Affymetrix datasets separately, then repeated this analysis using only probes that were shared by both platforms and were unambiguous (i.e. had a one-to-one mapping to a given gene on both microarray platforms). Using a Gene Set Enrichment Analysis (GSEA) method, we found two immune-related KEGG pathways to be significant in the Affymetrix dataset but not the Illumina dataset: cytokine-cytokine interaction (hsa04060) and type I diabetes mellitus (hsa04940). Both of these pathways included genes which were not profiled in the Illumina dataset, and indeed when the Affymetrix dataset was reduced to include only shared unambiguous probes hsa04940 was no longer significant and hsa04060 showed a smaller effect size. Chromosomal instability related genes – the most important finding from our previous work – remained significant across all analyses. Some genes which are important to our present analysis are not covered by the Illumina microarray, including TNFSF9, CXCL8 and CD274. We believe these differences justify our decision to focus on the Affymetrix platform, as it offers wider coverage of important immune genes. Pathway analysis results are included in Supplementary Table 5.
Sample selection for profiling
As previously described, all patients enrolled in the surveillance programme discussed above were considered for this study. For a given CIS lesion under surveillance, when a biopsy from the same site in the lung showed evidence of progression to invasive cancer or regression to normal epithelium or low-grade dysplasia, we defined the preceding CIS biopsy as a progressive or regressive ‘index’ lesion respectively. Due to the small size of bronchoscopic biopsy samples, not all profiling techniques were applied to all samples. Patients with Fresh Frozen (FF) samples underwent whole genome sequencing and/or methylation analysis depending on sample quality. Patients with formalin-fixed paraffin-embedded (FFPE) samples underwent gene expression analysis. Further detail is available in our previous manuscript(2). Additionally, any patient with an available FFPE block underwent image analysis as described below, and all patients with Affymetrixbased gene expression profiling underwent further profiling of laser-captured adjacent stroma.
Statistical Methods
Unless otherwise specified, all analyses were performed in an R statistical environment (v3.5.0; www.r-project.org/) using Bioconductor(11) version 3.7. Code to reproduce a specific statistical test is publicly available at the Github repository below.
Unless otherwise stated, comparisons of means between two independent groups are performed using a two-sided Wilcoxon test. In some cases, multiple samples have been profiled from the same patient, although always from distinct sites within the lung. In such cases we used mixed effects models to compare means between groups, treating the patient ID as a random effect, as implemented in the Bioconductor lme4 library(12), with p-values calculated using the Anova method from the Bioconductor car library(13). Differential expression was performed using the limma(14) Bioconductor package to compare microarray data between two groups. When adjustment for multiple correction is required we quote a False Discovery Rate (FDR) which is calculated using the Benjamini-Hochberg method(15). Cluster analysis and visualization was performed using the pheatmap(16) Bioconductor package.
Image analysis
All slides were scanned using NanoZoomer Digital Pathology System scanner model C9600-01, using NDP.scan version 2.5.89 (Hamamatsu, Japan).
Four distinct cell types from H&E images were identified with an automated deep learning pipeline trained using 21,009 pathological annotations from NSCLC samples in the TRACERx100 cohort(17). The four classes correspond to cancer cells, lymphocytes that included leukocytes and plasma cells, stromal cells that included fibroblasts and endothelial cells, and an “other” cell type that included nonidentifiable and less abundant cells such as macrophages, chondrocytes, and pneumocytes. Customised implementation of spatially constrained convolution neural networks(18) for TensorFlow were used for the single cell classification and detection tasks. The deep learning pipeline was validated using 5,951 pathological annotations within TRACERx as well as 5,082 annotations collected externally on an independent cohort of 100 NSCLC cases from the LATTICe-A study(19). Biological validation of this algorithm against immunohistochemistry data has been previously described (submitted for publication).
IHC
2-5μm tissue sections were cut and transferred onto poly-l-lysine–coated slides, dewaxed in two changes of xylene and rehydrated in a series of graded alcohols. Details of the three primary antibodies used are as follows:
SP35: Anti-CD4 Rabbit monoclonal antibody from Spring Biosciences Inc., Pleasanton, CA, US.
SP239: Anti-CD8 Rabbit monoclonal antibody from Spring Biosciences Inc., Pleasanton, CA, US.
236A/E7: Anti-FOXP3 Mouse antibody, Kind gift from Dr G Roncador, CNIO, Madrid (Spain).
Single immunohistochemistry was carried out using the automated platforms BenchMark Ultra (Ventana/Roche) and the Bond-III Autostainer (Leica Microsystems) according to a protocol described elsewhere(20, 21). To establish optimal staining conditions (i.e. antibody dilution and incubation time, antigen retrieval protocols, suitable chromogen) each antibody was tested and optimized on sections of human reactive tonsil, used as positive control.
Multiplex immunohistochemistry was carried out using a protocol described previously(21). Co-expression of nuclear and cytoplasmic or membranous proteins was easy to detect, as the colour of the chromogens remained distinct. Specificity of the staining was assessed by a haematopathologist (TM) with expertise in multiplex-immunostaining. Slides were scanned using the Hamamatsu Nanozoomer digital scanner as described above.
For T cell subset quantification, a similar deep learning pipeline was used. The convolutional neural networks were trained on sample TRACERx IHC CD4/CD8/FOXP3 images using 9,333 pathological annotations and validated against 6 NSCLC independent images using 5,028 pathological annotations. The IHC algorithm classified cells into four classes: CD8+, CD4+, FOXP3+ and “other” cell class (hematoxylin cells). When comparing cell counts between samples, absolute counts were divided by the region area. Regions of CIS and stroma within a slide were quantified separately, with regions annotated manually by the investigators.
Neoantigen prediction and LOHHLA
HLA typing was performed using Optitype(22) on germline (blood) WGS data from each patient. This was used as input for netMHCpan 4.0(23, 24) for neoantigen prediction; 9-, 10- and 11-mer peptides were considered for each somatic mutation, called using methods described above. To assess for quantitative differences between neoantigens in the progressive and regressive groups, we compared their binding affinities (as calculated by netMHCpan) and their differential agretopicity index (DAI), defined as the difference in binding affinity between mutant and wild-type peptides. Significant differences in these values were not observed between the regressive and progressive groups.
The same HLA typing data was used as input to the LOHHLA tool(25) (Loss of Heterozygosity in Human Leukocyte Antigen), alongside copy number, purity and ploidy data derived from ASCAT. This tool assesses each sample for the presence of LOH in the HLA region – a difficult task due to polymorphism in this region. Output plots from LOHHLA were visually checked prior to calling the presence or absence of HLA LOH in a sample.
DMR analysis
Methylation data analysis was performed using the Chip Analysis Methylation Pipeline (ChAMP) Bioconductor package with default settings(4). The functions champ.DMP() and champ.DMR() were used to identify differentially methylated probes (DMPs) and differentially methylated regions (DMRs) respectively. Annotation of DMPs and DMRs with affected genes is performed by default within these functions.
A criticism raised against this analysis is the identification of DMRs affecting a highly polymorphic region of chromosome 6. However, we argue that this is a differential analysis between two groups (progressive and regressive), with results replicated in an independent dataset from TCGA (Cancer vs Control data), therefore should not be affected by polymorphism unless the underlying HLA types are significantly different between the two groups. For each identified HLA type, based on 4-digit resolution, we compared the number of patients identified in the progressive and regressive groups using a Fisher’s exact test, and did not find any HLA types to be significant with p < 0.05.
Immune cell quantification from GXN data
To estimate relative immune cell populations from gene expression data we applied the method of Danaher et al.(26) This method was chosen as it has been shown to out-perform similar methods when benchmarked against immunohistochemistry in a large analysis of early-stage invasive lung cancer(27). Briefly, for each of 15 immune cell types, a small set of genes is defined which has been shown to correlate with the presence of that cell type. For each cell type, the mean expression of its associated genes gives a ‘score’ for that cell type. If a gene is not measured by the Affymetrix microarray used, that gene is ignored.
A ‘TIL score’, estimating the overall infiltration of lymphocytes into the tissue, is calculated by taking the mean of 10 individual cell type scores (B-cells, Cytotoxic cells, Exhausted CD8, Macrophages, Neutrophils, NK CD56dim cells, NK cells, T-cells, Th1 cells, CD8 T cells). This process is encoded in the R function do.danaher(), which is available from the Github repository accompanying this paper.
Immune cell quantification from methylation data
Similar immune quantification from methylation data was performed using methylCIBERSORT(28). Methylation data was first converted to a mixture file using the methylCIBERSORT R package version 0.2.0. A signature file for squamous cell lung cancer was also taken from this package; this signature was derived from TILs in squamous cell lung cancer, a very similar biological question to that of our study. These data were used as input to CIBERSORT(29) to provide relative values for each immune cell subtype included in the signature file.
Data Availability
All raw data used in this study is publicly available. Previously published CIS gene expression and methylation data is stored on GEO under accession number GSE108124; matched stromal gene expression data is stored under accession number GSE133690. Previously published CIS whole genome sequencing data is available from the European Genome Phenome Archive (https://www.ebi.ac.uk/ega/) under accession number EGAD00001003883.
Code Availability
All code used in our analysis will be made available at http://github.com/uclrespiratory/cis_immunology on publication. All software dependencies, full version information, and parameters used in our analysis can be found here.
Author Contributions
A.P. and V.H.T. contributed equally to this work, as did K.A., S.E.A.R. and T.L.. A.P., V.H.T., N.M. and S.M.J. co-wrote the manuscript. S.M.J., S.A.Q., V.H.T. and A.P. conceived the study design. V.H.T., D.C. and S.A. performed stromal LCM and gene expression profiling experiments. C.P.P. performed LCM and methylation experiments. H.L-S. and P.J.C. performed genomic experiments. A.A., T.L., J.Y.H. and T.M. designed and performed IHC experiments. K.A., S.E.A.R. and Y.Y. performed cell quantification on H&E and IHC images. S.M.J., P.J.G., B.C. and R.M.T. led the bronchoscopic surveillance programme through which samples were obtained. M.F. performed histological review. P.F.D. performed pathological processing. A.P. performed bioinformatic analysis, supported by R.R. and N.M.. R.E.H., K.H.C.G., C.D., A.F., C.S., C.T., S.A.Q. and N.M. gave advice and reviewed the manuscript. S.M.J. provided overall study oversight.
Competing Interests
S.A.Q. and C.S. are co-founders of Achilles Therapeutics. C.S. is a shareholder of Apogen Biotechnologies, Epic Bioscience, GRAIL, and has stock options in Achilles Therapeutics. R.R. and N.M. have stock options in and have consulted for Achilles Therapeutics.
ACKNOWLEDGEMENTS
We thank all of the patients who participated in this study. We thank P. Rabbitts, A. Banerjee and C. Read for their early development of the study. The results published here are in part based on data generated by a TCGA pilot project established by the National Cancer Institute and National Human Genome Research Institute. Information about TCGA and the investigators and institutions that constitute the TCGA research network can be found at http://cancergenome.nih.gov. R.E.H., N.M., P.J.C., and S.M.J. are supported by Wellcome Trust fellowships. S.M.J. is also supported by the Rosetrees Trust, the Welton Trust, the Garfield Weston Trust, the Stoneygate Trust and UCLH Charitable Foundation. V.T., C.P., R.E.H., S.A. and S.M.J. have been funded by the Roy Castle Lung Cancer Foundation. A.P. and D.C. are funded by Wellcome Trust clinical PhD training fellowships. H.L.-S. is funded by the Wellcome Trust Sanger Institute non-clinical PhD studentship. C.T. was a CRUK Clinician Scientist. This work was partially undertaken at UCLH/UCL, who received a proportion of funding from the Department of Health’s NIHR Biomedical Research Centre’s funding scheme (S.M.J.). R.E.H., N.M., C.S., and S.M.J. are part of the CRUK Lung Cancer Centre of Excellence. C.S., and S.M.J. are supported by Stand Up to Cancer. Y.Y. acknowledges funding from Cancer Research UK Career Establishment Award, Breast Cancer, Children’s Cancer and Leukaemia Group, NIH U54 CA217376 and R01 CA185138, CDMRP Breast Cancer Research Program Award, CRUK Brain Cancer Award (TARGET-GBM), European Commission ITN, Wellcome Trust, and The Royal Marsden/ICR National Institute of Health Research Biomedical Research Centre. S.A.Q. is funded by a CRUK Senior Cancer Research Fellowship, a CRUK Biotherapeutic Program Grant, the Cancer Immunotherapy Accelerator Award (CITA-CRUK) and the Rosetrees Trust. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.