Hsa-mir-210 as a novel signature predicts survival in lung squamous cell carcinoma using bioinformatics analysis

In recent years, more and more studies have shown that the expression of miRNAs is closely related to the occurrence of tumors and plays an irreplaceable role in the development and metastasis of tumors. The research was focused on lung squamous cell carcinoma. The data information is downloaded from the TCGA database and analyzed for variance, which is then verified in the GEO database. Then differential expression of miRNAs was found and survival analysis was performed, through the cut – off standard(P<0.05,|logFC|≥2), we screen the 38 up-regulated miRNAs and 14 down-regulated miRNAs from the TCGA. Finally, after the verification on the GEO database, four up-regulated miRNAs (hsa-mir-205, hsa-mir-210, hsa-mir-182, hsa-mir-224) and one down-regulated miRNA (hsa-mir-451) were obtained, which provide a new direction for the diagnosis of lung squamous cell carcinoma. In the survival analysis, it was found that the expression state of hsamir-210 was significantly correlated with patient survival. The results in the univariate and multivariate Cox analysis indicated that hsa-mir-210 was an independent prognostic factor on lung squamous cell carcinoma. The functional enrichment analysis showed that hsa-mir-210 was closely related to positive and negative regulation of cell proliferation, DNA transcription, VEGF signaling pathway, MAPK signaling pathway, hif-1 signaling pathway and choline metabolic pathway. In summary, this study suggested that hsa-mir-210 could be a potential prognostic factor for lung squamous cell carcinoma. Author Summary MicroRNAs are single-stranded small molecular RNA that participate in the regulation of various biological functions through indirect regulation of gene expression, and have been reported to play an important role in the occurrence, development, invasion and metastasis of tumors. In recent years, the research on miRNAs has become increasingly hot, and the role of miRNAs in tumor has been proved more and more. The subjects of this study were squamous cell carcinoma in lung cancer with relatively few studies on miRNAs. Through high-throughput data analysis, miRNAs with differential expression between lung squamous cell carcinoma tissues and normal tissues were found. These differentially expressed miRNAs provide a new direction for the early diagnosis of patients. Then, survival analysis was conducted to find the miRNAs significantly correlated with the total survival time of patients, and multi-factor analysis was conducted to exclude the influence of other clinical factors, and independent risk factors (miRNAs) affecting the survival of patients were determined, so as to provide new targets for the treatment and survival prediction of patients.


Introduction
Lung cancer has the highest mortality rate of all cancers and is the leading cause of cancerrelated deaths worldwide [1,2]. Most (85%) of lung cancers are classified as non-small-cell lung cancer (NSCLC) and small-cell lung cancer (15%) (SCLC). The two predominant NSCLC histological phenotypes are adenocarcinoma and squamous cell carcinoma (LUSC) [3]. LUSC is associated with greater mortality and morbidity due to its highly invasive nature that often invades neighboring tissues, and can metastasize distant organs [4].The treatments of the LUSC are often less effective and the chemotherapy remains the major therapeutic choice [5].The mortality is particularly higher in advanced stages compared with early interventions. At the present, LUSC still lacks effective molecular target for developing target therapy. Therefore, it is necessary to explore novel biomarkers for prognosis prediction and development of molecular target therapy for LUSC patients [1,6].
MicroRNAs (miRNAs), a key component of the small and noncoding RNA family, are approximately 18-25nucleotides that involved in the post-transcriptional regulation of gene expression [7]. By binding to the 3' or 5' untranslated region of the target transcripts, miRNAs can modulate genes expression through translational repression or cleavage of mRNA [7,8]. It has been shown that miRNAs are aberrantly expressed in various types of malignancies and function either as oncogenes or tumor suppressors. Accumulating evidence has demonstrated that miRNAs regulated various carcinogenesis processes including cell maturation, cell proliferation, migration, invasion, autophagy, apoptosis, and metastasis [9]. Therefore, this suggests that miRNAs can be potential noninvasive biomarkers for cancer [8] and have a large potential to serve as promising markers in the diagnosis, prognosis, and personalized targeted therapies [9,10].
Although there are a large number of studies on the relationship between miRNAs and lung cancer, there are also some marker molecules that were identified to predict clinical survivals.
However, many studies have focused on lung cancer or NSCLS, and there are few studies on the subclass of squamous cell carcinoma. And there are many inconsistence exist in previous studies which may due to the small sample size, heterogeneous histological subtype, different detection platforms, and various data processing methods. The Cancer Genome Atlas (TCGA) is a large-scale, collaborative effort led by the National Cancer Institute and the National Human Genome Research Institute to map the genomic and epigenetic changes that occur in 32 types of human cancer, including nine rare tumors [11]. There are a large number of high-throughput miRNAs sequencing data on LUSC in TCGA. The miRNAs sequencing data of LUSC and normal tissues used in this study were downloaded from the TCGA database. The clinical data of the sample is also obtained in TCGA. By analyzing differentially expressed miRNAs and verify in the Gene Expression Omnibus (GEO) database, we find a correlation with patient survival and identify miRNA that can effectively predict patient survival. On this basis, we analyze the protein expression and biological function of potential target miRNAs, and provide a new understanding of the molecular mechanism of LUSC.

Results
The main research data of TCGA obtained 387 samples of miRNAs expression information, including 342 LUSC tissue samples and 45 normal tissue samples (S1 Table), and obtained clinical information of 337 samples (S2 Table), mainly including diagnosis age, sex, smoking history category, metastasis, lymph node stage, tumor stage, etc. (Table 1). According to the cut-off criteria, under the condition of P<0.05, the expression was up-regulated by logFC≥2, and the expression of

Dataset screening and verification of GEO database.
A total of 4 eligible data sets were selected from the GEO database, including GSE16025, GSE19945, GSE51853 and GSE74190 (S3 Table). Due to the relatively small sample size of the dataset in the GEO database, our cut-off criteria is set at P<0.05 and |logFC|≥1. The differentially expressed miRNAs obtained from the four data sets were taken together (Fig. 2), and a total of eight differentially expressed miRNAs shared by four data sets were obtained. Then, the differentially expressed miRNAs obtained from the TCGA database were matched to and obtain four up-regulated miRNAs, hsa-mir-205, hsa-mir-210, hsa-mir-182, hsa-mir-224, and one down-regulated miRNA, hsa-mir-451.

Target gene prediction and functional enrichment analysis.
The target genes of hsa-mir-210 was predicted using TargetScan, miRDB, starBase, and miRanda online analysis tools. A total of 520 genes appearing in two or more databases were identified. (Fig. 5). ( Fig. 6). The HIF-1 signaling pathway, MAPK signaling pathway and VEGF signaling pathway are interrelated and play a role in tumor growth and metastasis (Fig.7).

Discussion
By 2015, 1.59 million people worldwide had died of lung cancer, of whom 30-40% had been diagnosed with NSCLC (squamous cell carcinoma). Lung cancer is also the leading cause of cancer death in China, and LUSC accounts for a high proportion. Because most LUSC patients are diagnosed at advanced stage, with high surgical risk and poor cardiopulmonary function, it has been considered as a refractory solid tumor [12][13][14]. If some markers can be found for early diagnosis, the diagnosis rate of patients will be greatly improved, and early intervention will improve the overall survival time of patients. In this study, we found five miRNAs (hsa-mir-205, hsa-mir-210,hsa-mir-182,hsa-mir-224,hsa-mir-451) that were significantly differentially expressed in tumor tissues and normal tissues through screening and validation, and one(hsa-mir-210) of them was significantly correlated with patient survival. After multivariate analysis, it was still significantly correlated with patient prognosis and was expected to be a new marker for predicting survival.
In the past decade, miRNAs have become biomarkers for diagnosis, prognosis and prediction of treatment response, both from tumor specimens and in biological fluids [15]. In terms of lung cancer, as early as in 2004 has made the role of miRNAs in lung cancer, there is evidence that more than half of microRNAs genes associated with cancer genome area or fragile sites, and the lack of a few miRNAs in lung cancer cell lines and the low level of expression of chronic lymphocytic leukemia [15,16]. Subsequently, numerous studies have been published on the role of miRNAs in lung cancer. For example, the down-regulated expression of hsa-mir-199 in lung cancer is closely related to staging, distant metastasis and poor prognosis, and may inhibit the malignant progression of lung cancer by interacting with RGS 17 [17]; hsa-mir-30e plays an inhibitory role in NSCLC, and may inhibit cell proliferation and invasion by directly targeting SOX 9 [18]; The hsa-mir-451 regulates survival of NSCLC cells partially through the downregulation of RAB14 and targeting with the hsa-mir--451/RAB14 interaction might serve as a novel therapeutic application to treat NSCLC patients [19]. In addition, hsa-mir-373 [20], hsa-mir-17-92 cluster [21,22], hsa-mir-21[23], hsa-mir-126 [24], hsa-mir-145 [25], and hsa-mir-340 [26] were also found to be closely related to the occurrence, progression and survival of lung cancer.
In this study, through the difference analysis of the TCGA database and the verification of the  [29], another study also showed that hsa-mir-182, hsamir-210 and other three miRNAs can be used as a new diagnostic markers for LUSC [30], and in stageⅡ LUSC patients, high expression hsa-mir-182 is an independent positive prognostic factors [31]. There are relatively few reports on hsa-mir-224, but it could be used together with other miRNAs as an evaluation indicator for the palliative effect of advanced LUSC [32]. Studies of hsamir-451 have also been reported in NSCLC [33]. Taken together, 5 differentially expressed miRNAs provide a new direction for the diagnosis of LUSC, or may be used as diagnostic markers to improve the clinical diagnosis rate.
Subsequently, we analyzed the relationship between five differentially expressed miRNAs and OS, and found that the expression level of hsa-mir-210 was significantly correlated with OS [34,35], however, the relationship between the expression of hsa-mir-210 and the survival of LUSC has rarely been reported. In this study, we obtained through KM survival analysis that the overall survival of patients with low expression of hsa-mir-210 was significantly prolonged in the higher expression group, and the difference between the two groups was statistically significant (P=0. 007).
At the same time, T staging of the tumor was also significantly associated with survival in multivariate cox analysis. We then analyzed the relationship between the expression of hsa-mir-210 and survival in different T staging subgroups, and found that in the early (T1, T2) and late (T3, T4) T staging subgroups, the results showed that the overall survival of patients in the low expression group of hsa-mir-210 was significantly better than that in the high expression group. Based on the above analysis results, it can be seen that hsa-mir-210 is an independent prognostic risk factor affecting the survival of patients with LUSC and can be used as a potential prognostic factor.
In order to further investigate hsa-mir-210, we predicted its target gene and then analyzed the pathway and related functions. In the analysis, hsa-mir-210 was closely related to positive and negative regulation of cell proliferation, DNA transcription, VEGF signaling pathway, MAPK signaling pathway, hif-1 signaling pathway and choline metabolism pathway. Moreno Roig et al.
reported that the expression of HIF-1 in NSCLC has the effect of increasing radio sensitivity [36];The MAPK signaling pathway has also been reported in cell proliferation, differentiation, migration, aging and apoptosis [37]; Glunde K et al. also made a detailed report on the role of choline metabolism in the diagnosis of tumor molecules [38]. Therefore, we need to further study the molecular mechanisms of these pathways to provide new clinical interventions for LUSC to improve patient survival. And the pathway interaction also reveals the role of hsa-mir-210 in the growth, invasion, and metastasis of LUSC To sum up, we analyzed 5 potential miRNAs that may be signaling molecules for the diagnosis of LUSC, and identified hsa-mir-210 that may be a potential prognostic factor for LUSC. Further studies need to validate our findings in large samples, and further functional studies also need to explore the molecular mechanisms of these miRNAs in the progression of LUSC.
Data download and processing. A total of 1047 miRNAs were involved in the analysis. Under the condition of P<0.05, the expression was up-regulated by logFC≥2, and the expression of logFC≤-2 was down-regulated.

Screening and Analysis of Validation Datasets in GEO Database.
Search the relevant database on the GEO website (https://www.ncbi.nlm.nih.gov/geo/) and search for the keywords (microRNA OR "miR" OR "miRNAs") AND (Lung OR pulmonary) AND (Cancer OR tumor OR neoplasm OR malignancy OR carcinoma OR SCC OR NSCLC), select "series" for the data type, "Homo sapiens" for the species. For data sets related to NSCLC, the inclusion criteria for the data set are: 1) the sample is diagnosed as NSCLC and the pathological type is squamous cell carcinoma; 2) the cancerous tissue specimen and the lung cancer tissue specimen (healthy human lung tissue or cancer) Next to normal lung tissue); 3) more than 5 tumor samples and normal samples; 4) is the expression data spectrum for sequencing the miRNAs of the sample; 5) The R online tool can be used. Then the GEO analysis tool R was used to obtain the difference analysis data of the four data sets respectively. Under the condition of P<0.05, the |logFC| ≥1 was used as the screening standard. Based on the TCGA database difference analysis data and the validation data of the GEO dataset, four high-expression miRNAs were obtained, namely: hsamir-205, hsa-mir-210, hsa-mir-182, hsa-mir-224; Low expression of miRNA, hsa-mir-451.

Analysis of relationship between differentially expressed miRNAs and patient prognosis.
Differentially expressed miRNAs were divided into high expression group and low expression and gene count≥5 were set as the cut-off criteria.