Epigenetic mutational landscape in breast cancer: role of the histone methyltransferase gene KMT2D in triple negative tumors

Purpose Epigenetic regulating proteins like histone methyltransferases produce variations in several functions, some of them associated with the generation of oncogenic processes. Mutations of genes involved in these functions have been recently associated with cancer, and strategies to modulate their activity are currently in clinical development. Methods By using data extracted from the METABRIC study, we searched for mutated genes linked with detrimental outcome in invasive breast carcinoma (n = 772). Then, we used downstream signatures for each mutated gene to associate that signature with clinical prognosis using the online tool “Genotype-2-Outcome” (http://www.g-2-o.com). Next, we performed functional annotation analyses to classify genes by functions, and focused on those associated with the epigenetic machinery. Results We identified KMT2D, SETD1A and SETD2, included in the lysine methyltransferase activity function, as linked with poor prognosis in invasive breast cancer. KMT2D, which codes for a histone methyltransferase that acts as a transcriptional regulator, was mutated in 6% of triple negative breast tumors and found to be linked to poor survival. Genes regulated by KMT2D included RAC3, KRT23, or KRT14, among others, which are involved in cell communication and signal transduction. Finally, low expression of KMT2D at the transcriptomic level, which mirror what happens when KMT2D is mutated and functionally inactive, confirmed its prognostic value. Conclusion In the present work, we describe epigenetic modulating genes which are found to be mutated in breast cancer. We identify the histone methyltransferase KMT2D, which is mutated in 6% of triple negative tumors and linked with poor survival.


Introduction
Advances in the analyses of the genomic landscape of human cancers have permitted the identification of different molecular alterations, including mutations, copy number variations, or gene rearrangements, which may be linked with the genesis and maintenance of tumors [1,2]. Unfortunately, for most of the identified molecular alterations, limited druggable opportunities exist [1,2]. Very well-known exceptions include inhibition of protein kinase activity, when that alteration affects a kinase [2]. This has been the case for agents targeting mutated or amplified protein kinases, such as EGFR or HER2 in lung and breast cancers [3][4][5]. In a similar manner, chromosomal rearrangements can produce fusion proteins, like Trk fusion proteins, with kinase activity amenable for pharmacological inhibition [6,7].
Changes at the genome not directly produced by an alteration of the nucleotide sequence of the DNA are known as epigenetic modifications [8]. Alterations in proteins involved in epigenetic regulation can affect genetic programs that can in turn impact on several cellular functions. Ultimately, such genomic alterations can translate into different diseases, from cancer to neurological alterations or aging disorders, among others [8,9].
Epigenetic regulating proteins include enzymes involved in histone modifications, histone proteins, chromatin remodeling complexes or DNA methylation enzymes [8]. Mutations at genes coding for proteins involved in several of these functions have been already described, and some of them have been associated with cancer [10]. Therefore, inhibition of epigenetic proteins can have a wide effect impacting on the expression of multiple genes, affecting multiple pathways at the same time [10]. In this context, agents that target epigenetic enzymes have been recently described and are currently in clinical development [11].
In this study, we evaluated the mutational status of genes involved in epigenetic control in breast cancer, identifying KMT2D as mutated in around 6% of triple negative tumors and linked with a particular detrimental prognosis.

Identification of Breast Cancer mutated genes
Data was extracted from the Breast Cancer METABRIC study (n = 2509), contained at cBioPortal (http://www.cbioportal.org). First, we searched for mutated genes in those samples from Invasive Breast Carcinoma patients (n = 772), including luminal A, luminal B, HER2+ and basal-like. Genes that were mutated in more than 2.5% of the patients were identified. The frequency of mutations was independently confirmed using the TCGA database (n = 818).

Functional analyses
For the functional annotation analysis of the set of mutated genes, the gene list enrichment analyses tool DAVID Bioinformatics Resources 6.8 (https://david.ncifcrf.gov/) was used. To do so, genes with a mutation frequency greater than 2.5% and linked with poor prognosis were selected.
For the functional analysis of the KMT2D-associated gene signature (S1 Table), the online Enrichr tool was used (http://www.amp.pharm.mssm.edu/Enrichr/). An adjusted pvalue <0.05 was applied to select enriched gene-sets. Genes were separated into overexpressed and underexpressed and "KEGG 2015" option was chosen for the analyses and the calculation of the "combined score".

Outcome analyses
To evaluate the relationship between the presence of mutated genes and patient clinical outcome, the Genotype-2-Outcome online tool (http:// http://www.g-2-o.com) [12] was used (S1 table). This publicly available database allows the evaluation of clinical outcome for all breast cancer subtypes (All, Triple Negative Breast Cancer, Luminal A, Luminal B and HER2+) by exploring the association with prognosis of a specific transcriptomic signature associated with that mutation.
To evaluate the relationship between the expression of the genes and patient clinical prognosis, the KM Plotter Online Tool (http://www.kmplot.com) [13,14] [13,14] [12,13] was used. This database permits the evaluation of overall survival (OS) and relapse-free survival (RFS) in basal-like, luminal A, luminal B, HER2+ and triple negative breast cancers.
For both outcome analyses, patients were separated according to median values.
Patients above the threshold were considered to have a "high" expression while patients below the threshold were defined as those with "low" expression.

Results
By using the METABRIC database, we identified 172 mutated genes in the analyses of the 772 samples from invasive breast tumors. We found that 59 out of the 172 genes were mutated in more than 2.5% of the samples. Next, we evaluated the impact of these genes on patient outcome using the online tool Genotype-2-Outcome (http://www.G-2-O.com/) [12] ( Figure 1A). This application identifies the transcriptomic signature associated with the presence of the mutation in patients. Using this approach, 44 of the mutated genes had an associated signature linked to detrimental prognosis in breast cancer ( Figure 1A). To get insides into the biological function of the mutated genes, we performed a functional annotation analysis. Protein binding, kinase activity, DNA binding and transcription factor binding were among the identified functions which grouped more genes ( Figure 1B).
Then, the mutational frequency of the identified genes for all breast cancer subtypes was studied. Mutations in some of the genes have been widely described in breast cancer, as is the case for TP53, in luminal and HER2+ tumors (Figure 2A). In the case of TNBC, mutated genes displaying higher frequency, more than 8%, included SYNE1, CDH1 and DNAH11 ( Figure 2A). In HER2+ disease, PIK3CA was mutated in more than 40% of tumors. Of note, mutated genes found in TNBC tumors showed a broader range of functions than the other subtypes ( Figure 2B). Because epigenetic enzymes are currently under evaluation as druggable targets, we focused on genes that had this function. Therefore, we selected the three genes included in the functional analyses under the "Histone-lysine N-methyltransferase activity" function, KMT2D, SETD2 and SETD1A, ( Figure 1B). Next, we confirmed the presence of these mutations in the different breast cancer subtypes, using data contained at TCGA (Table 1).
According to TCGA data, mutations of KMT2D were observed in 6% of TNBC and mutations of SETD2 in 1.2%, confirming the data obtained with METABRIC. However, the presence of mutation in the other breast cancer subtypes was not confirmed or was too low compared to the percentage found in METABRIC. On the other hand, the proportion of SETD1A mutations was not confirmed in TCGA for any of the subtypes (Table 1). Next, we aimed to further explore the impact of the mutations of these two genes in patient prognosis, by exploring the effect of their associated transcriptomic signature in breast cancer (All subtypes).  As the presence of KMT2D and SETD2 mutations were consistent using both databases (METABRIC and TCGA) in TNBC, we next explored if KMT2D and SETD2 mutational signatures were associated with detrimental prognosis in this specific tumor subtype. Notably, the presence of the associated transcriptomic signatures for both, KMTD2 and SETD2, were associated with poor prognosis (HR 0.58 CI: 0.45-0.74; log rank p=1.9e-05 and HR 0.55 CI: 0.43-0.71; log rank p= 4.2e-0.6; respectively) ( Figure 3B). From here, we focused on KMT2D, as it was the most prevalent mutated gene in both datasets and was strongly associated with poor outcome. KMT2D is a histone methyltransferase that acts as a transcriptional regulator. The complete list of deregulated genes included in the KMTD2 associated transcriptomic signature is shown in S1 table, and the functions of these genes, determined with the online tool Enrichr, are displayed in figure   4. Most down-regulated genes were included in the cell communication function, followed by tyrosine metabolism or extracellular matrix receptor interaction (S1 Table). Genes which codify for Keratins, KRT23 or KRT14, were among the most relevant genes included in the cell communication function (Figure 4). The most relevant upregulated gene included the GTPase RAC3, that belongs to the RAS family of small GTPases involved in cell proliferation (S1 Table and   Last, we explored the functional consequences of the mutations present in KMT2D in the samples of the METABRIC database. To identify these mutations, we used the online tool cBioportal ( Figure 5A). Missense mutations were scattered along the full length of the protein, and were the most abundant molecular alterations, followed by truncating mutations ( Figure 5B). The functional impact of all these different mutations, evaluated with three different databases (Mutation Assessor, SIFT and PolyPhen-2), are displayed in Figure 5C.
As shown, between 40-55% of KMT2D mutations had a functional impact. This indicated that those mutations lead to an abnormal protein, unable to participate in their normal function, mimicking a lack of expression of the gene. To confirm this hypothesis, we decided to explore if a low expression level of this gene could recapitulate the outcome observed at a mutational level, when we explored the effect of mutated KMT2D. Using the online tool KMplotter, that links the transcriptional expression of a gene with patient outcome [14], we found that low transcriptomic levels of KMT2D were associated with detrimental prognosis (relapse free survival) in all breast tumors (HR 0.64 CI: 0.55-0.79; log rank p=2.4e-08) ( Figure 5D), in addition to the triple negative subtype (HR 0.71 CI: 0.551-0.98; log rank p=0.035) ( Figure 5E).

Discussion
In the present article we report the identification of genes that are mutated in breast cancer and associated with detrimental outcome. After functional analysis of the identified genes, we focused on the "Histone-lysine N-methyltransferase activity" function and found that the histone methyltransferase gene KMT2D was mutated in around 6% of the TNBCs samples evaluated; in addition to be associated with poor prognosis in this breast cancer subtype.
KMT2D is a histone methyltransferase that methylates the Lys-4 position of the histone H3 [16]. The codified protein belongs to a large protein complex termed ASCOM, which is one of the transcriptional regulators of the estrogen receptor genes [16,17].
KMT2D mutations have been associated with the development of different tumors, including small cell lung cancer [16], esophageal squamous cell carcinoma, and large B-cell lymphoma [16]. Although there are many other tumors where mutations in this gene have been described [16,18], neither those mutations have been previously reported in breast cancer, nor their impact on patient outcome has been assessed.
Recent data suggest that KMT2D is involved in the recruitment and activation of relevant breast cancer genes including FOXA1, PBX1, and ER [17]. As described in the present article and other reports, most of the mutations in KMT2D are frameshift and nonsense mutations in the SET and PHD domains, respectively [17]. Most of the described mutations result in the protein loss or in a reduction of the methyltransferase activity [19].
Therefore, this can produce defective enhancer regulation and, subsequently, modifications in the transcription of several genes or increase in genomic instability [8,20]. This is demonstrated in our study by the transcriptomic signature associated with the gene mutation, which will be discussed later, particularly with the upregulation of RAC3. Of note, KMT2D displays different effects depending on the cellular context, due to the recruitment of different transcription factors [16].
When evaluating the transcriptomic signature linked to KMT2D mutations, we found that RAC3 was one of the most significantly upregulated transcripts. This transcript codes for a GTPase which belongs to the RAS superfamily of small GTP-binding proteins, and it has been linked with the pathophysiology of many solid tumors, including breast cancer [15,21,22]. In breast cancer RAC3 regulates invasion and migration participating in the metastatic process [15].
Finally, we confirmed that the expression level of the KMT2D gene was associated with clinical outcome in a similar manner as we observed for the presence of the gene mutations, which mostly produce a reduction or loss of protein expression or a decrease in its activity.
This result indirectly confirms the robustness of the mutational gene signature in relation to outcome.
In conclusion, in the present work, we identify that the histone methyltransferase gene

Competing interests
No competing interests to declare.

Availability of data and materials
All data generated and/or analyzed during the current study are available from the corresponding author on reasonable request.