Abstract
microRNAs (miRNAs) are small non-coding RNAs crucially important in cancer. Dysregulation of miRNAs is commonly observed in cancers and it largely cancer dependent. It is thus important to study pan-cancer miRNA expression in order to develop accurate and sensitive biomarkers. Here, we developed the OMCD - OncoMir Cancer Database, a web server that allows easy and systematic comparative genomic analyses of miRNAs sequencing data derived from over 9,500 cancer patients with associated clinical information and organ-specific controls present in the Cancer Genome Atlas (TCGA) database. The OMCD webserver provides 1). simple visualization of TCGA miRNA sequencing data, 2). statistical analysis of differentially expressed miRNAs for each cancer type and 3). allows for exploring miRNA clusters across cancer types. The OMCD web server is freely available to all users and can be accessed at www.oncomir.umn.edu/omcd/.
Database URL: www.oncomir.umn.edu/omcd/
Background
microRNAs are small non-coding RNAs that regulate gene expressions through post-transcriptional modifications, by binding to the 3’ UTR region of the target messenger RNAs (1). Dysregulation of miRNAs has been associated with cancers, such as colorectal cancer, lung cancer, lymphoma, glioblastoma, and osteosarcoma (2). The dysregulation of miRNAs in cancer makes them candidate biomarkers for diagnosis, classification, prognosis and potential therapeutic targets (2). Currently, using miRNAs as diagnostic and classification biomarkers has been approved by the United States Food and Drug Administration (FDA) for cancers of the lung, thyroid, and kidney, as well as for identifying cancer origin. However, because miRNA dysregulation is largely cancer dependent, it is important to study pan-cancer miRNA expression in order to develop accurate and sensitive biomarkers.
The Cancer Genome Atlas (TCGA) contains miRNA expression data for over 10,000 patients from 33 different types of cancer (3). There are currently two major web-based repositories of analyzed TCGA data - cBioPortal and GDAC firebrowse (4). However, both platforms mainly focus on the analysis and visualization of genomic and mRNA data and neither provides in-depth analysis and comparative visualization of miRNAs data. One database, ONCOMIR, provides analysis of TCGA miRNA data and calculates miRNA markers as survival signatures (5). However, this database lacks simple visualization of the TCGA miRNA expression data and the ability to explore miRNA clusters. Here we developed OncoMir Cancer Database (OMCD) that provides 1). simple visualization of TCGA miRNA data, 2). statistical analysis of differentially expressed miRNAs for each cancer type and 3). allows for exploring miRNA clusters across cancer types.
Construction and content
Method
The OMCD repository was created using the LAMP software bundle (Linux, Apache2, MySQL 5.0 and PHP) and HTML, as described previously (6). Briefly, the application accessible to researchers across the globe. An Apache web server is used to host the web application. PHP is used to generate the user interface and used to communicate with the MySQL database at the backend. Normalized expression data, statistical results, and annotation data are stored within the database. A user-friendly graphical interface is provided to assist the users for ease of data retrieval and for selection of different criteria for data analysis. PHP is used to generate the HTML content through a database-driven architecture that was designed for incorporation of additional information.
Data Analyses
The miRNA expression data of 9,656 cases (8,993 Tumor and 663 Control samples) spanning 33 cancer types were downloaded from TCGA data repository (https://gdc.nci.nih.gov; Table 1). Differential expression was analyzed using a two-group t-test to determine miRNAs that were differentially expressed between 1) normal and tumor tissue for a given tumor type, 2) normal tissue compared to all other normal tissues and 3) tumor tissue compared to all other tumor tissues based on the availability of samples. It is important to mention that each of the experiments has a different statistical power available and that the absence of a miRNA from a specific dataset may be due to the variable power available.
Number of cases included in the OMCD for each cancer type.
Results
The OMCD repository is available at https://www.oncomir.umn.edu/omcd. OMCD provides 4 different types of search functions (Figure 1A). Here we use the miR-21 expression in colon cancer (COAD) as an example. In the current version of OMCD repository, we have 8 control samples and 272 tumor samples for COAD. Searching for miR-21 in COAD samples (Figure 1A, B) returns a heat map showing the absolute expression level of miR-21 in all COAD samples (Figure 1C). Users can also obtain the numerical expression data (Figure 1D; not completely shown due to limited space) and the relative expression data (Figure 1E). When users click on the miRNA, they will be taken to a page showing links to additional analysis (Figure 1F). Here the website provides detailed information about the location of the miRNA and the name of colocalized miRNAs (miRNA clusters), as well as internal links to the expression data of miR-21 in other cancers and further statistical analysis (Figure 1H). Additionally, we provide external links to the miRDB website for target prediction and literature search on Google scholar (7). From this page, we can then visualize the colocalized miRNA expressions in a heat map showing absolute expression (Figure 1G). This analysis will display the expression levels for colocalized miRNAs in all cancer types (not shown due to limited space) and can be visualized in both absolute and relative heat map and numerical data form. Finally, we show the statistical analyses of miR-21 comparing Control vs. Tumor, Tumor vs. Tumor and Control vs. Control across cancer types (Figure 1H). The 3 different comparisons allow for simple visualization of the expression patterns of miR-21 across different cancer groups.
To further demonstrate the utility of our database we identified miRNA which was recurrently significantly differentially expressed between tumor and normal control samples with a highly significant p-value < 0.000001 and an average fold change greater than the absolute value of 2 which were recurrently present in 5 or more tumor normal comparisons (Figure 2). Many of these miRNAs are well known in cancer and have been reported to differentially expressed between tumor and normal in a wide range of tumor types. For example, miR-21 is consistently upregulated in most cancer types, consistent with previous reports (8). Suggesting miR-21 could potentially be a general cancer biomarker, however not a suitable biomarker for specific cancer types.
The miRNA expression in cancers from similar anatomical regions shows more similar patterns. This indicates tissue-specific miRNA expressions. For example, the colon cancer (COAD) and rectal cancer (READ) clusters have very similar miRNA expression pattern compared to other cancers. Although COAD and READ have overall similar miRNA expression pattern, miR-101-1 is significantly different between COAD tumor and control samples while not for READ (Figure 2). Additionally, because miR-101 do not show significant high expression in other cancers, it is reasonable to hypothesize that this miRNA is a biomarker for COAD. Similarly, miR-10b shows significant high expression level in hepatocellular carcinoma (LIHC) but not in other cancer types. These are two examples of the potentially testable hypothesis that OMCD is able to generate. Further experimental validation is thus warranted to investigate the function of miR-101 in COAD and miR-10b in LIHC.
Discussion
Evidence from the past decade indicates miRNAs play crucial roles in cancers. With the development of high-throughput sequencing technology, more high-throughput miRNAs data is publicly available. Here we developed the OncoMir Cancer Database (OMCD), a simple web-based repository that allows easy and systematic comparative genomic analyses of miRNA expression data. Using OMCD, we were able to identify miR-101 as a biomarker candidate specifically for COAD. We found that the expression level of miR-101 is significantly higher in COAD tumors but not in other tumors, compared to respective normal controls. Previous studies, however, show different expression levels of miR-101 in colorectal cancer (9,10). Contrary to the TCGA-COAD data which shows high miR-101 in tumors, previous studies suggest miR-101 is downregulated in colorectal tumors and this miRNA act as a tumor-suppressing miRNA where overexpression of miR-101 can inhibit the tumor invasion and growth (9,10). Using the OncomiR database, we were able to verify that miR-101 is indeed overexpressed in COAD tumors based on the TCGA data. We think the mixed results for miR-101 in COAD between TCGA and other cohorts warrant a further investigation into the function of miR-101 in COAD to ascertain if miR-101 is a suitable biomarker for COAD. We also observed from OMCD that miR-10b could be a potential biomarker for LIHC (11). Previous studies confirmed that miR-10b is indeed highly expressed in LIHC and it is involved in neoplastic transformation of liver cancer stem cells and promotes metastasis (12–14). Additionally, other studies also show an oncogenic role of miR-10b in breast cancer, gastric cancer, and glioblastoma (15–18). These findings suggest miR-10b has a multifaceted function in many cancers, this further warrant studies to confirm if it is a suitable biomarker candidate specifically for LIHC.
In the current version, OMCD contains data derived from 9,656 cancer patients with associated clinical information and organ-specific controls present in TCGA. To our knowledge, OncomiR database (www.oncomir.org) is the only other online resource for analyzing miRNA expression data (5). The conflicting results we found between the TCGA and other cohorts demonstrate a limitation of the current version of OMCD as well as the OncomiR, which lack miRNA datasets from other cancer patient cohorts. In addition, the OncomiR database lacks the option to analyze miRNA clusters. It is important to consider miRNA cluster members when studying miRNAs in cancers, especially to generate hypothesis from high-throughput data. This is because miRNA cluster members usually have similar expression levels, however, potentially vast different biological functions. The ability to visualize and explore miRNA clusters in OMCD is crucial to develop a defendable hypothesis. In the future, we plan to expand OMCD by incorporating additional miRNA expression data sets from public data repositories such as Gene Expression Omnibus (GEO), Genomic Data Commons (GDC), and European Bioinformatics Institute (EBI). We believe this will significantly improve the ability to use OMCD to develop a defendable hypothesis.
Conflict of interest
The authors declare no conflicts of interest.
Acknowledgments
SS is supported by research grants funded by the NIH/NCI grant R03CA219129; and CY, by a research fellowship from the Bioinformatics and Computaional Biology graduate program, University of Minnesota. We thank Dr. Mary Knatterud for assisting in manuscript preparation.