O-linked mucin-type glycosylation regulates the transcriptional programme downstream of EGFR in breast cancer

Aberrant mucin type O-linked glycosylation is a common occurrence in cancer. This type of O-linked glycosylation is not limited to mucins but can occur on many cell surface glycoproteins where only a small number of sites may be present. Upon EGF ligation, EGFR induces a signaling cascade but can also translocate to the nucleus where it can directly regulate gene transcription. Here we show that upon EGF binding, human breast cancer cells carrying different O-linked glycans respond by transcribing different gene expression signatures. This is not a result of changes in signal transduction but due to the differential nuclear translocation of EGFR in the two glyco-phenotypes. This is regulated by the formation of an EGFR/galectin-3/MUC1/β-catenin complex at the cell surface that is present in cells carrying short core-1-based O-glycans characteristic of tumour cells but absent in core-2-carrying cells.


INTRODUCTION
Glycosylation of proteins is the most abundant post-translational modification and greatly increases the size, diversity and function of the proteome. Nearly all proteins that are expressed on the cell membrane or are secreted carry glycans that are involved in cell adhesion, recognition, molecular trafficking, clearance and signaling (1). Moreover, aberrant glycosylation occurs in essentially all types of human cancer making this one of the hallmarks of malignancy. Indeed, changes in glycosylation appear to be early events, as well as playing key roles in the induction of invasion and metastases (2,3).
O-linked mucin-type glycosylation (here referred to as O-linked glycosylation) is characterized by the addition of GalNAc to serine or threonine residues of proteins.
Although found abundantly on mucins, which are heavily O-linked glycosylated, this type of glycosylation is also found on many other types of glycoproteins including those involved in signal transduction (4,5). Changes in O-linked glycosylation is one of the most prevalent glyco-phenotypes observed in solid tumors (6) and often results in O-linked glycoproteins carrying short, linear sialylated glycans rather than the longer branched glycans seen in normal epithelial cells (7). However, the presence of short linear or branched chains is not mutually exclusive and particularly in estrogen receptor negative breast cancers the O-linked glycans can contain branched glycans (8) (figure S1a). Aberrant O-glycosylation promotes tumour growth (9,10), leads to remodeling of the microenvironment (11) and increases metastatic potential (8,12).
Members of the epidermal growth factor receptor family (ErbBs) play a major role in cancer. ErbB2 (HER2) is involved in driving tumorigenesis in breast cancer and ErbB1 (EFGR) signaling is frequently dysregulated in cancer. Overexpression and mutation of EGFR is seen in many solid tumours. In addition to HER2, EGFR is an important therapeutic target with a number of small molecular inhibitors and anti-EGFR antibodies being in clinical use (13). EGFR is over-expressed in triple negative breast cancers but activation of its signaling pathway is important in other sub-types of breast cancer (14).
A number of proteins have been shown to interact with EGFR including the highly Oglycosylated mucin, MUC1, which is upregulated and aberrantly glycosylated in breast and other carcinomas. This interaction is reported to modulate the stability of EGFR by preventing its ubiquitination. This results in the recycling of EGFR rather than its degradation (15). In turn, EGFR can phosphorylate the cytoplasmic tail of MUC1 (CT-MUC1) (16). Moreover, CT-MUC1 interacts with EGFR in the cell nucleus and promotes the binding of EGFR to chromatin and the CCND1 (cyclin D1) promoter (17).

The gene expression profile of cells treated with EGF changes with alterations in cellular O-glycosylation
To determine if O-linked glycosylation influences the transcription programme induced upon stimulation with EGF, we used the breast carcinoma cell line T47D that expresses short, linear O-glycans and isogenic cell lines engineered to express branched core 2 O-glycans through the transfection of GCNT1 encoding the C2GnT1 glycosyltransferase (fig S1a), and which we refer to as T47D-C2GnT1 (18). We isolated two independent clones T47D-C2GnT1-J and T47D-C2GnT1-B which overexpressed GCNT1 by at least 30 fold (fig S1b). All the lines expressed MUC1 as shown by staining with the HMFG1 antibody (fig S1c). However, both T47D-C2GnT1-J and T47D-C2GnT1-B cells had reduced staining with the anti-MUC1 monoclonal antibody SM3 (fig S1c). As the binding of SM3 to its epitope within MUC1 is inhibited by core 2 glycans (9,18) this confirms that glycoproteins expressed by T47D-C2GnT1-J and -B carry branched core 2 glycans. We also examined the To study the transcriptional program induced upon stimulation with EGF in cells with different O-glycosylation patterns, genome wide expression microarrays were performed with T47D and T47D-C2GnT1-J cells stimulated with EGF for 2h and 6h.
Upon EGF stimulation, approximately 8% of the transcripts where significantly modulated (FDR < 0.01) after 2h or 6h of treatment in either T47D or T47D-C2GnT1-J cells (fig 1c and table S1). As expected Cyclin D1 (CCND1) was confirmed to be up-regulated, with twice the expression of the baseline (log2 fold change >= 1, fig 1c) after 6h of treatment. Some transcriptional targets of EGFR like the DUSP6 phosphatase that is transcriptionally induced by EGF signaling (20,21) was found to be upregulated in both cell lines at both time points (fig 1c). However, we found that some genes (such as MMP10 or ANGPTL4) were overexpressed in T47D but not in T47D-C2GnT1-J, while others like FGFBP1, CX3CL1 and MYC were more highly expressed in T47D-C2GnT1-J (fig 1c).
Gene ontology enrichment analysis (GOEA) (table S2) of overexpressed genes with at least two fold change (log2 fold change >= 1) as compared to non-stimulated conditions confirmed a strong enrichment for transcription factor activity and positive regulation of protein phosphorylation as was described before for EGF stimulation of breast primary and cancer cells in (21). We selected a subset of 20 genes that showed differential expression between the two glyco-phenotypes for further validation. These were classified into 4 categories: (i) membrane receptors and proteases (ii) cytokines and growth factors (iii) migration and metastasis related genes and (iv) transcription factors (fig 1d). We confirmed the differential expression between T47D and T47D-C2GnT1-J of 16 out of the 20 by quantitative real-time PCR (table S3). We also performed a global comparison of the difference in gene expression between the paired cell lines T47D and T47D-C2GnT1-J at 0, 2 and 6 hours after EGF stimulation (fig S2a and table S1). Among the genes differentally expressed between the two cell lines at baseline, 2 or 6 hours of treatment with EGF, we identified most of the genes validated previously (fig 1c-d and

EGF induced the differential release of microenvironment modulating factors in cells with different O-glycosylation
As MMP10, CX3CL1 and FGF-BP1 have all been associated with cancer progression we chose this subset of genes for further study. We observed that the protein expression of the matrix metalloproteinase MMP10 was strongly induced in T47D cells after 6h of EGF stimulation (fig 2b) but was very weakly expressed in T47D-C2GnT1-J cells. In contrast, FGF-BP1 showed no protein induction detectable by immunoblots in the supernatant of T47D cells, but it was significantly induced after EGF stimulation in T47D-C2GnT1-J (fig 2b). Moreover, the chemokine CXC3L1 was increased in the supernatants of EGF stimulated T47D-C2GnT1-J (fig 2c) whereas T47D cells showed no increase in secreted CXC3L1 at 6 hours or 24 hours after EGF stimulation. Together, these data confirm that the differential mRNA expression between T47D and T47D-C2GnT1-J of a subset of genes induced by EGF results in the differential protein expression of these genes by the two glyco-phenotypes.

CX3CL1, MMP10 and FGF-BP1 expression in breast cancer
In most breast and many other adenocarcinomas the dominant O-linked glycans are linear, short sialylated glycans based on core 1 (fig S1a), whereas in normal breast epithelial cells branched core 2 glycans are exclusively found (7). However, in ER negative (ER-ve) breast cancers core 2 based glycans appear to be the dominant O-linked glycans carried on glycoproteins (8). In a glyco-related gene expression analysis of ER positive (ER+ve) and ER-ve primary breast cancer samples we observed that the ER-ve breast tumours (that carry branched core 2 glycans) tested,

EGFR accumulation in nuclear endosomes but not activation of EGFR signaling is enhanced in cancer cells carrying core 2 O-glycans
To investigate the mechanisms whereby differences in O-linked glycosylation could influence gene transcription in response to EGF, we looked at EGFR signaling in response to EGF in T47D and T47D-C2GnT-J. The cells were treated with EGF for In addition to activation of signaling from the cell membrane, following endocytosis, EGFR can traffic to the nucleus and bind to the promoters of genes regulating transcription (15,17,22). Therefore, a change in the localization of EGFR could be a possible mechanism to explain the differential transcriptional program induced by EGF in these cells. Because N-glycosylation can affect receptor turnover and endocytosis (23) Together these results show that upon EGF binding, in T47D-C2GnT1-J cells that carry branched core 2 glycans, a significant increase in EGFR nuclear associated endosomes is observed compared to T47D carrying core 1-based glycans.

EGFR, GAL3 and MUC1 form a complex upon EGFR activation of T47D carrying core 1 O-glycans but not in T47D carrying core 2 O-glycans
Galectin-3 is a lectin that binds preferentially to terminal beta-galactosides on mature N-glycans and elongated core 2 O-glycans (25). Galectin-3 has been shown to bind to the extracellular domains of EGFR and MUC1 (26,27). We immunoprecipitated EGFR from solubilized membrane and cytoplasmic cell extracts of T47D and T47D-

C2GnT1-J cells and observed that EGFR forms a complex with galectin-3 and MUC1
in T47D cells treated with EGF but not in resting cells, as has been previously described (28). In contrast, upon EGF stimulation no MUC1/galectin-3/EGFR complex was observed in T47D-C2GnT1-J cells (fig 5a). Interestingly some interaction of EGFR with galectin-3 was observed in unstimulated T47D-C2GnT-J cells but this was at a very low level (fig 5a).

b-catenin, an important transcriptional regulator in breast cancer cells interacts with
EGFR (29) and binds to the phosphorylated cytoplasmic tail of MUC1 (30). We therefore investigated if b-catenin was in a complex with EGFR and MUC1 in T47D cells. We observed that b-catenin precipitated with EGFR in EGF stimulated T47D cells but not in starved cells (fig 5a). However, in T47D-C2GnT-J cells we observed b-catenin to be in a complex with EGFR in non-stimulated and EGF stimulated cells in the absence of MUC1. These results show that extended core 2 O-linked glycosylation inhibits the interaction of EGFR with galectin-3 and MUC1 but facilitates the constitutive interaction of EGFR with b-catenin.
To study the relevance of galectin-3 binding and b-catenin interaction in the EGF

DISCUSSION
Changes in glycosylation are very common in cancer and although increased sialylation is a common event, different tumour types can show different glycosylation patterns. Indeed different cancers can be clustered according to the expression of glycosyltransferases (32). The two O-linked glyco-phenotypes investigated here represent core 1 and sialylated core 1-based glycans (T47D), and core 2-based glycans (T47D-C2GnT1). While sialylated core 1 glycans are found on the majority of breast cancer cells, and normal mammary epithelial cells express glycoproteins carrying exclusively core 2 O-linked glycans, the differential expression of these glycans is not absolute and in ER-ve breast cancers branched core 2 glycans appear to dominate although core 1 based glycans are also present (8).
Upon binding of EGF, EGFR dimerizes and activates its tyrosine kinase activity leading to a plethora of down-stream signalling pathways. However, it is now clear that ligand binding can also stimulate the nuclear localisation of EGFR (22,(33)(34)(35) resulting in its direct involvement in the regulation of transcription. Here we show for the first time that the O-linked glyco-phenotype of the cell can influence the pattern of gene transcription induced by EGF binding to EGFR. As changes in O-linked glycosylation are very common in the transition to malignancy this finding is highly significant to EGFR driven cancer progression.
Global gene analysis showed that many of the genes expressed in the two glyco- and CXC3L1 have all been implicated in cancer and showed differential expression between the two phenotypes. MMP10 is significantly upregulated in ER+ve breast cancer that carry linear, core 1 glycans as seen in T47D cells. In contrast the chemokine CX3CL1 and FGF-BP1, were more highly expressed in the core 2 carrying glyco-phenotype and our analysis shows that expression in breast cancer is highly correlated with the ER-ve subtype which carries more core 2 O-linked glycans.
The lectin galectin-3 can exert multiple cellular functions (25) and binds to EGFR (26) and MUC1 (27 observed between T47D and T47D-C2GnT1 may be due to differential trafficking/recycling of the receptor and indeed we found EGFR accumulation in the nuclear endosomes and nuclear extracts in core 2 carrying than in the core 1 carrying Heatmaps and volcano plots were generated using ggplot2 package as implemented in Bioconductor (45).

Statistical analysis
All data are presented as mean+SEM. The number of biological replicates (independent experiments in cell based assays) is stated in each figure legend as n.
Statistical analysis was performed using GraphPad Prism software and statistical significance calculated using an unpaired two-tailed t-test (for comparing two conditions) or Mann-Whitney U-test (for expression analysis with METABRIC data).
For the genome expression array a sample size of n=3 was used. Using statmate we determined that a sample size of 3 in each group has a 90% power to detect a difference between means of 50.51% half or double the expression) with a significance level (alpha) of 0.05 (two-tailed) using an unpaired t test.

Branched Core2
Normal mammary epithelium All core 2 glycans Suppl. Figure 2: Differential gene expression between two glyco-phenotypes (a) T47D and T47D-C2GnT1-J cells were treated as in figure 1c of main text. Volcano plots of differentially expressed genes between T47D core-1 and T47D core-2 breast cancer cells before and after 2h or 6h of treatment with EGF are shown. Significantly differentially expressed genes (FDR < 0.01) in cells before treatment (untreated) or after 2h or 6h of treatment with EGF 100ng/ml are highlighted in red (up-regulated) and green (downregulated). Relevant transcriptional targets of EGF signaling are highlighted as triangles. (b) T47D and a second clone of T47D-C2GnT1-B were starved for 24 hours and treated with EGF (100ng/ml) for 2h. Transcript levels of MMP10 and CX3CL1 were assessed by quantitative PCR using PUM1 as a housekeeping gene, quantified using the DDCt method and shown relative to the expression in starved, non-treated T47D cells (*P<0.05, **P<0.01; analysis by t-student test n=3).