Abstract
WGCNA (weighted gene co-expression network analysis) can provide a system-level insight into molecular interaction. To understand the co-expression pattern of miRNA in the progress of colon cancer development, this study constructed the WGCNA co-expression network to find the key microRNA of clinical significance and therefor provide possible personalized treatment targets. The miRNA expression data and clinical follow-up information of 442 valid colon cancer samples were obtained from TCGA. After the co-expression network was constructed, we found the turquoise module has the strongest correlation with stage IV. The miRNAs of the key module are enriched by GO and analyzed by KEGG, and the HUB-miRNAs are screened out through the network node degree. For clinical significance validation, data of 65 cases of colon cancer in GES29623 were analyzed by Cox regression and Kaplan-Meier, in 9 HUB-miRNAs, 2 miRNAs associated with overall survival rate were screened out. They can divide patients into high-risk group and low-risk group. Conclusion: has-miR-126-3p and has-miR-137 are the key miRNAs of COAD metastasis, they are of prognostic value in COAD.
1. Introduction
Colorectal cancer (CRC) is one of the most common cancers in the world and the fourth most common cause of cancer-related death(1, 2). The molecular mechanism of colon cancer has yet to be fully discussed. The survival rate of colon cancer patients varies significantly depending on the stage at the time of discovery. Early diagnosis is a key factor for patients with colorectal cancer to achieve a better prognosis(3). The survival rate of early colon cancer (I) was about 90%, while the survival rate of advanced colon cancer (Ⅲ) decreased to 65%, due to the lack of sensitive and effective biomarkers, the 5-year survival rate of patients with distant metastasis decreased to 12.5%(4). Early screening has been proven to have a great impact on increasing survival rate(5). To understand the mechanism of the occurrence development invasion and metastasis of colon cancer is essential to develop targeted strategies.
MiRNA is a small non-coding RNA, that regulates gene expression and participates in almost all biological processes of the human body(6). They can affect the occurrence and prognosis of diseases by regulating mRNA in target cells, which is the regulator of a variety of human diseases(7). Functional studies have shown that all known tumor processes, including apoptosis, proliferation, survival and metastasis, are regulated by miRNA(8). In recent years, the key role of miRNA in the tumor has attracted wide attention(9). As a biomarker of many kinds of cancer, miRNA has a broad application prospect. The unique ability of miRNA to affect multiple downstream pathways represents a new approach to cancer treatment(10). Several kinds of miRNA as biomimetic tumor inhibitors have begun the first phase of clinical trials(11). At present, there are still few studies about the role of miRNA in COAD (colon adenocarcinoma), and the current studies are mostly based on the comparison of differences between groups, limited by the number of samples these studies can only be analyzed within a very limited statistical framework.
WGCNA (weighted gene co-expression network analysis) is a systematic biological analysis strategy that can be used to analyze gene association patterns between a large number of samples and identify highly collaborative gene sets without information loss. The key biomarkers or therapeutic targets can be found according to the inherent association between gene sets and phenotypic correlation(12). Compared with other analysis tools, WGCNA can look for gene sets in massive gene information, it makes full use of the information of each sample, and eliminates the problem of multiple hypothesis test correction. It has played an important role in the field of biomarker discovery in recent years. It has been reported that WGCNA as an analyze tool has been applied to a variety of cancers, such as oral cancer pancreatic ductal adenocarcinoma and gastric cancer, to study the relationship between gene expression matrix and clinical characteristics, in order to determine the rules for predicting the survival outcome of patients(13-15).
2. Materials and methods
2.1 Samples and preprocessing
The miRNA sequencing and clinical data of 455 COAD samples were obtained from TCGA database (https://cancergenome.nih.gov/). Among the related clinical follow-up data, the total survival time (days), staging, lymphatic metastasis, body weight and condition were selected for follow-up analysis. The gene expression profile of GSE S29623 for validation was downloaded from Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/). The further data processing was performed on the RStudio (v1.1.462) and complied with the human subject protection and data access policies of TCGA and GEO.
2.2 Weighted gene co-expression analysis
A goodSamplesGenes function is used to test whether the miRNA data matrix is qualified. After data cleaning a co-expression network is constructed according to the protocol of WGCNA in R environment. According to the principle of WGCNA analysis, the similarity of different gene expression profiles is understood based on the Pearson correlation coefficient, which is an index to measure the consistency of gene expression profiles among samples(16). Then, the similarity matrix is transformed into the adjacent matrix by using the power adjacency function, and the connection strength between the pairs of nodes is encoded. The selected soft threshold function is used to evaluate whether the topology of the network is scale-free or not. β is a soft threshold-power parameter. We went through the parameters and determined that the power with β = 2 (scale-free R2 ≥ 0.87) can ensure the scale-free network (Fig 1). Then, hierarchical clustering trees are constructed from different branches of trees that represent different gene modules. According to different needs, there are three different ways to build the network and identify the module. In this study, we use the one-step method function for network construction and consistency module detection.
2.3 Identifying important modules and key miRNAs related to target clinical features
We represent each module by calculating the ME (module eigengene), which can be used to capture the largest variation in the module to identify meaningful modules. We associate ME with clinical features to quantify module-feature correlation and find the most important association. Through the signed KME and module Eigengenes function, and using miRNet to build a network, through node degree, we screened out the key miRNAs in the relevant modules (17).
2.4 Functional analysis
A large number of target genes were introduced through mirPathV3 database, and a variety of gene analysis tools were used to analyze the modules of interest, miRNA and target genes, and the information of cell composition, molecular function and KEGG pathway were obtained(18).
2.5 Hub-miRNAs Verification
In order to ensure the reliability of the results, it is necessary to ensure that the data used for verification come from different samples, so we use databases from different sources to carry out comprehensive survival analysis and evaluation of the key miRNA. We used GSE29623 (N = 65), a colon cancer-related miRNA chip from the GEO database, to analyze the survival of the screened miRNA by cox regression and the KM method. Finally, the HUB-miRNA is verified again by Oncomir (http://www.oncomir.org/).
3 Results
3.1 The clinical information and miRNA sequencing of 442 samples were screened out by goodSamplesGenes function, each sample contains 2149 MIRNA expression information, and finally, a matrix containing 442 × 2149, a total of 949858 miRNA is obtained. WGCNA analysis showed that there were 9 functional modules (black, green, magenta, red, yellow, pink, turquoise, blue, brown). The grey model represents genes that cannot be classified and have nothing to do with the disease being analyzed (Fig 2). The miRNAs selected in each category draws a network heat map in its corresponding module (Fig 3) The hierarchical clustering tree was obtained (Fig 4). Each module has a high degree of independence. Each module was combined with their corresponding clinical traits. According to the combination, we found the module with the strongest correlation with stage IV, the phase with distant metastasis, is the turquoise module, P < 0.003 (Fig 5).
3.2 Extraction of key miRNAs for functional annotation KEGG aathway enrichment analysis
In order to functionally annotate the center miRNA, we analyzed the miRNAs using miRNet, to determine the KEGG and GO terms. According to the functional analysis of KEGG and GO, the turquoise module is mainly concentrated in cancer-related words such as proteoglycans in cancer and TGF-beta signals. There are 85 miRNAs in the turquoise module that meet the condition of | kME | > = threshold (0.8), to narrow the range, we set up the interaction network of miRNA-GENE by using all 255 miRNA in the turquoise module and their target genes verified by experiments in intestines (19). 9 miRNAs in the center of the network screened out by node degree as the HUB-miRNAs, hsa-mir-186-5p, hsa-mir-27b-3p, hsa-mir-130a-3p, hsa-mir-137, hsa-mir-494-3p, hsa-mir-337-3p, hsa-mir-31-3p, Hsa-mir-126-3p, hsa-mir-29c-5p (Fig 6). Through KEGG pathway REACTOME and GO analysis of key miRNA, it can be found that these 9 candidate miRNAs are associated with a variety of cancers such as colon cancer, prostate cancer, renal cell carcinoma, etc(Fig 7-10).
3.3 Prognostic value verification
In order to validate the prognostic value of HUB-miRNAs found in this study as biomarkers, we evaluated whether HUB-miRNAs were associated with the overall survival rate of COAD patients. Cox regression and KM analysis were performed using GSE26923 (n=65) chips from GEO database. It was found that in the HUB-miRNA of turquoise hsa-mir-137 (P < 0.031636223) and hsa-mir-126-3p (P < 0.027019656) were significantly correlated with overall survival rate (Fig 11). Using ONCOMIR for survival analysis, it was found that has-miR-126-3p and has-miR-137 could divide patients with COAD into high-risk group and low-risk group S = 3.378*EmiR-137+1.473*EmiR-126-3p. P< 0.007865 (Fig 12).
2 Discussion
Several cases of massive data mining miRNA with the WGCNA method have been reported. The characteristic of this method is that the correlation coefficient of the expression value is taken to the N power, so that their distribution is more in line with the scale-free network the true distribution in nature. It is used to describe the correlation pattern between genes with large amounts of data, full utilization of information makes it an ideal tool for predicting biomarkers(15, 20). It is common to select the few miRNAs that satisfy the condition of e | kME | > = threshold (0.8) for follow-up analysis, but it may lead to information loss. In the current study we select all the miRNAs in the module for network analysis, reduce the information loss, and focus on the center miRNAs which are closest to the relevant pathway. The molecular function and cell function were annotated by GO, we used another 65 samples of GSE26923 to verify the prognostic value of HUB-miRNAs.
In general, miRNA can play a negative role in the regulation of mRNA expression by binding to the complementary sequences in the target mRNA 3’-UTR, resulting in translation inhibition and / or target degradation(21). In recent years, increasing evidence has proved that miRNAs could also upregulate gene expression in specific cell types and conditions with distinct transcripts and proteins(22). It has found that the role of miRNA depends on MiRISC, and the dominant role of animal miRISC binding to target mRNAs are considered to be transcriptional instability, the increase of a certain miRNA may represent the instability of many cancer-related factors(23). It is an ideal and efficient therapeutic target, which is closely related to the occurrence and development of the tumor. KEGG analysis of the 9 candidate miRNAs in the turquoise module and their target genes showed that they were significantly associated with colon cancer (P = 4.09e-7), and with many pathways in cancer, including WNT and MAPK signals, TGF-beta and p53 VEGF and it is also closely related to cell cycle pathways. Our data show that these miRNAs are key regulators of these core carcinogenic functional pathways in COAD.
In order to confirm the prognostic value of HUB-miRNAs, we verified two miRNAs (hsa-mir-137, hsa-mir-126-3p) through the data from other sources, through which patients can be divided into the high-risk group and low-risk group. The HUB-miRNAs regulate 21 target genes that have been shown to be associated with colon cancer. It is reported hsa-mir-137 upregulated 6.6-fold in the lymph node positive colon cancer group compared with the lymph node negative colon cancer group(24). Its downstream target genes include CDK6, CDC42. CDK6 plays an important role in the progress of G 1 phase and G 1 / S transition in the cell cycle(25). It can regulate the activity of tumor suppressor protein Rb by phosphorylation and has been shown to act as an anti-tumor factor in a variety of cancers such as breast cancer / liver cancer / colon cancer(26). CDC42 plays an important role in regulating cell cycle(27), CDC42 expression was associated with a longer survival rate in breast cancer (P < 0.025)(28). Overexpression of hsa-mir-126-3p can promote cell proliferation and migration by activating PI3K/AKT signaling pathway(29). Hsa-mir-126-3p is reported as a novel regulator of PVC-mediated vessel stability during tumor angiogenesis(30). The expression of hsa-mir-126-3p is increased in patients with lung adenocarcinoma. As a biomarker for the diagnosis of early lung cancer, hsa-mir-126-3p is highly sensitive and specific(31). Its target genes include E2F1 VEGFA FOXO3. E2F1 gene polymorphism is considered to be associated with lung cancer head and neck cancer and other cancers. VEGFA gene polymorphism has been confirmed to be associated with metastatic colorectal cancer bladder cancer, renal cell carcinoma, prostate cancer and other cancers(32-35). FOXO 3a plays an inhibitory role in cancer and is closely related to malignant tumors such as breast cancer, colon cancer, prostate cancer, bladder cancer and nasopharyngeal carcinoma(36). We believe that these two kinds of miRNAs participate in the progress of COAD and increase the metastasis potential through their respective mechanisms. Their combination can be used as a biomarker to predict the behavior of COAD. It is a promising drug target for inhibiting tumor metastasis and increase survival rate.
There are also many imperfections in our research. The sample size of GSE chips for verification (n=65) is relatively smaller than WGCNA data (n=442). Fewer miRNAs were detected in GSE chips. There may be more HUB-miRNAs that have prognostic value. The direct experimental verification of the mechanism of HUB-miRNAs in the metastasis of colorectal adenocarcinoma needs to be done in the follow-up study.
This work was supported by the National Natural Science Foundation of China (http://www.nsfc.gov.cn/) under Grant No.81572082; Jilin Science and Technology Foundation (http://kjt.jl.gov.cn/) under Grant No.20150414015GH. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Disclosure of interest
The authors report no conflict of interest.
The datasets generated and analyzed in this paper are available in the TCGA (https://portal.gdc.cancer.gov/) and GEO (www.ncbi.nlm.nih.gov/geo/) databases.