Abstract
Hepatocellular carcinoma (HCC) is a major cause of cancer-related death worldwide, and has poor prognosis. Pyroptosis, which is cell programmed necrosis mediated by gasdermin, participates in the progress of tumor. Recently, multiple omics analysis was applied frequently to provide comprehensive and more precise conclusions. However, Multi-omics analysis combining pyroptosis-related signatures in HCC and their corrections with prognosis remain unclear. Here, we identified 42 pyroptosis genes that were differentially expressed between HCC and normal hepatocellular tissues. According to these differentially expressed genes (DEGs), all HCC cases could be divided into two heterogenous subtypes. Then we evaluated the prognostic value of differential pyroptosis-related gene to construct a multigene model using The Cancer Genome Atlas (TCGA) cohort. A 22-gene model was built and classified HCC patients in the TCGA cohort into the low-risk and high-risk groups, by the least absolute shrinkage and selection operator (LASSO) Cox regression method. HCC patients belonged to the low-risk group had significantly higher survival possibilities than those belonged to high-risk group (p<0.001). Furthermore, the related genes and two groups were analyzed with multiple omics in different molecular layers. The pyroptosis-related gene model was validated with HCC patients form Gene Expression Omnibus (GEO) cohort, and the low-risk group in GEO showed increased overall survival (OS) time (P=0.018). The risk score was an independent factor for predicting the OS of HCC patients. In conclusion, pyroptosis-related genes in HCC are correlated with tumor immunity and could be used to predict the prognosis of HCC patients.
Introduction
Hepatocellular carcinoma (HCC) results in >80% of primary liver cancers in the world. HCC also causes a heavy disease burden and is estimated to be the fourth most common cause of cancer-related death worldwide[1]. The main risk factors for HCC is the hepatitis B virus and hepatitis C virus infection[2]. Moreover, non-alcoholic steatohepatitis associated with metabolic syndrome is becoming a more frequent rick factor[3]. Currently, there are various treatment options, including surgical resection, chemotherapeutics, immunotherapies, new methods for delivery of drugs and use of combination therapy[4]. However, the adjusted incidence rates and death rates have continued to increase[5]. And non-invasive diagnosis is currently challenged by the require of molecular information that needs tissue or liquid biopsies[3]. Thus, it’s still urgent to find ideal combinations therapies or advanced detection methods for early stage of hepatocellular carcinoma.
Multi-omics analysis provides an integrative analysis to maximize comprehensive biological insight across molecular layers[6]. A novel form of cell regulated necrosis, pyroptosis, which mainly induced by gasdermin, plays a crucial role in cancer and hereditary diseases[7]. Pyroptosis is an inflammatory form of cell death, and pyroptotic cells are characterized by cellular swelling and bubble-like protrusions[8]. Pyroptosis can be triggered by the canonical caspase-1 inflammasomes or by activation of caspase-4, -5 and -11 by cytosolic lipopolysaccharide[9]. Then the canonical caspase-1 inflammasomes the effector molecule Gasdermin D (GSDMD) cleaved and promotes its oligomerization to form large pores in the plasma membrane, causing cell death[10]. Besides, some cytokines such as IL-18 and IL-1β were active during pyrototic. Moreover, pyroptosis has a crucial role in the proliferation and migration of cancer regulated by molecules like non-coding RNAs[11]. Furthermore, it has been revealed that pyroptosis-induced inflammation triggers robust antitumour immunity and can synergize with checkpoint blockade[12]. These findings collectively demonstrate that pyroptosis has significant roles in the development and antitumour processes. However, its specific functions with multiple omics in HCC have not been reported. Thus, we performed a multi-omics study to determine the functions of pyroptosis-related genes in HCC, explore the gene copy number variants, mutations, immunocyte correlations, tumor stem cell correlations and drug sensitivities of related genes and two different risks groups, and establish a robust prognostic model based on pyroptosis for detection the early stage of HCC.
Materials and Methods
Dataset Collection
The HCC RNA-seq count and clinical profiles were obtained from the TCGA GDC database (https://portal.gdc.cancer.gov/) and the GEO database (https://www.ncbi.nlm.nih.gov/geo/, ID: GSE20140). The FPKM data normalized from RNA-seq count was transformed to log2(TPM+1) for further analysis. The data of cope number variation and simple nucleotide variation were downloaded from TCGA GDC database (https://portal.gdc.cancer.gov/). All expression data have been normalized before analysis. Patients were excluded if they died within 30 days or did not have prognostic information.
Analysis of Differential pyroptosis-related genes in HCC
55 pyroptosis-related genes together were collected from prior articles. The differentially expressed genes (DEGs) in HCC samples and matched normal tissues were analyzed by the “limma” package in the R software and visualized by the heatmap with adjust p-value < 0.05. The protein-protein interaction network for DEGs was generate using STRING website (https://cn.string-db.org/). The gene ontology (GO) and Kyoto Encylopedia of Genes and Genomes (KEGG) pathway enrichment analyses were applied to explore the molecule mechanisms of risk using the R “clusterProfiler” package.
Construction and Validation of the prediction model for HCC
The Cox regression analysis was used to screen the prognostic DEGs. And the screened genes were further narrowed down by the LASSO Cox regression model (R “glmnet” package) to develop the prognostic model. After standardization and normalization of the expression data, the risk score formula was calculated based on a screened 22-gene signature as follows: ∑ 7 iXi × Yi (X: coefficients, Y: gene expression level). Subsequently, the patient samples from TCGA and GEO cohort were divided into low and high risk groups based on the model. Kaplan-Meier survival curves were depicted to predict the clinical outcomes in the two groups by the R “survival” package. The R “survminer” and “timeROC” packages were applied to assess the survival and prognosis of patients. Multi-Omics Data Analysis
To determine the mutation of DEGs in samples, the R “maftools” package was applied. The R “CIBERSORT” package was used to assess the immune cells infiltration in samples. Then, the R “ggplot2” was used to visualize the correlation of immune cells and risk genes. To explore the TME, the scores of immune-related projects were evaluated by the R “estimate” package. The drug sensitivity analysis was performed by the R “pRRophetic” package. The R “ggpubr” and “ggExtra” packages were employed to assay the correlation of tumor stem cell index and risk.
Statistical Analysis
One-way ANOVA was applied to calculate the differences in gene expression. The Pearson chi-square test was used to compare the profiles between two subgroups. Differences in OS between two subgroups were performed by Kaplan-Meier method with a two-side long-rank test. Hazard rations (HRs) were calculated by univariate and multiple Cox regression analysis. All statistical significance was considered as a p-value less than 0.05. All statistical analyses were achieved by the R software 4.0.1.
Results
Landscape of pyroptosis genes in HCC
55 pyrotosis-related genes were collected, and they were compared in The Cancer Genome Atlas (TCGA) data from 50 normal and 374 tumour tissues. Together, we identified 42 differentially expressed genes (DEGs) (all p <0.05). Among them, 9 genes were downregulated while 32 other genes were upregulated in the tumour group compared to the normal group. The RNA levels of these genes are presented as heatmaps in Fig.1A. To further explore the interactions of these differentially expressed pyroptosis-related genes, we conducted a protein-protein interaction (PPI) analysis, and the results are shown in Fig.1B. The minimum required interaction acore for the PPI analysis was set at 0.9. The correction network containing all pyroptosis-related genes is present in Fig.1C. Furthermore, the tumor mutational burden (TMB) of these genes in HCC samples was assayed (Fig.2A). And the frequency of copy number variations (CNV) of pyroptosis genes in HCC was further evaluated (Fig.2B).
Tumour classification based on the expression level of pyroptosis genes
After removing the normal hepatocellular tissues, we used unsupervised clustering methods to classify the tumor samples into different molecular subgroups based on pyroptosis-related genes. By increasing the clustering variable (K) from 2 to 9, we found that when K=2, the intragroup correlations were low, indicating that the HCC patients could be well divided into two clusters, termed as C1 and C2, based on the 42 DEGs (Fig.3A, B). Combining the matched clinical profiles, we found the overall survival (OS) time of the two clusters has a significant difference (Fig.3C). To evaluate more clinical significances of subtypes, clinical outcomes and clinicopathological features were compared between the two clusters, the results showed that the grade of disease has a significant difference between the two clusters (p<0.001) (Fig.3D).
Establishment of an accurate prognostic model using the TCGA cohort
The differential genes were screened between two cluster. And 79 HCC samples from Gene Expression Omnibus (GEO) cohort (GSE20140) were selected. Then, we intersect the genes of the TCGA cohort, GEO cohort and differential genes from two clusters. Further, the intersected genes of TCGA cohort combining matched clinical information were analyzed with Univariate Cox regression analysis to screen of the survival-related genes. The p value filter was 0.001. It showed that the majority of survival-related genes were associated with increased risk with HRs>1 (Table.1). Through the least absolute shrinkage and selection operator (LASSO) Cox regression analysis, 22 genes were selected to be modeled according to the optimum λ value (Fig. 4A, B). The risk forecasting formula was calculated as follows: = (−0.004*SPP1 exp.) + (0.143*MYCN exp.) + (−0.03*PON1 exp.) … + (0.11*MT3 exp.) (Table.2). Based on the median score calculated by the risk score formula, 374 HCC patients from TCGA cohort were trained into low- and high-risk subgroups (Fig.4C). The principal component analysis (PCA) showed that patients with different risks were well separated into two clusters (Fig.4D). There has a significant difference in OS time between the low and high risk groups (P<0.001) (Fig.4E). The time-dependent receiver operating characteristic (ROC) analysis showed that the areas under the ROC curve (AUC) were generally higher than 0.8 at 1, 3, 5 years, demonstrating that the prognostic model has high accuracy and sensitivity (Fig.4F).
Internal and external validation of the prognostic model using the GEO cohort
HCC patients from a Gene Expression Omnibus (GEO) cohort (GSE20140) were utilized as the validation set. The gene expression data were normalized before analysis. According to the median risk score from the TCGA cohort, patients from the GEO cohort were also divided into the low- and high-risk groups (Fig.5A). The PCA showed satisfactory separation between the two subgroups (Fig.5B). Similar to the TCGA cohort, Kaplan-Meier analysis indicated that the low-risk group had bette r overall survival than the high-risk one (p=0.018) (Fig.5C). Moreover, the ROC curve analysis showed that the AUC was 0.846 for 5 years, 0.839 for 7 years, and 0.817 for 9 years (Fig.5D).
The prognostic value of the established risk model
Firstly, the model genes were analyzed by univariate Cox regression to evaluate the prognostic value of some features such as risksoce, age, disease-stage. The result showed that the riskscore and disease-stage were significant prognostic factors for patients (p<0.05) (Fig.6A). Subsequently, we take the two factors into further multivariate analysis, and it indicated that the riskscore can serve as independent prognostic factor (p<0.05) (Fig.6B). Moreover, we generated a risk-heatmap of clinical characteristics based on the model genes. The map showed that the T staging, stage and grade of HCC distributed diversely between the high and low-risk groups divided by these model genes (p<0.001) (Fig.6C).
Identification the DEGs and functional analysis
The differential gene expression (DEGs) was analyzed between the two risk subgroups which defined by the risk model. The classical method - “limma” R package was applied to identified the DEGs. And the filter standard was set at |log2FC | ≥ 1 and the adjusted p value (FDR) < 0.05. Together, there have 73 DEGs between the two subgroups (Table.3). Subsequently, these 73DEGs were further enriched by Gene ontology (GO) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathway analysis. The enriched results indicated that biological functions contributed to the risk difference were main related with material metabolism and molecule secretion (Fig.6D, E).
Comprehensive biological analysis with multiple dimensions
The immune cell infiltration in HCC samples from TCGA cohort were evaluated, and the correlation of model genes and the infiltrated immune cells was further analyzed. The result indicated that the CD8 T cells and CD4 T cells memory activated had strong positive correlation with GZMH gene, while CD4 T cells memory resting had strong negative correlation with GZMH (Fig.7A). Moreover, we evaluated the tumor microenvironment (TME) of HCC samples according to the immune-related scores (including stromal score, immune score and estimate score). And the difference between the scores of two risk subgroups were further compared, it showed that the low risk group have high scores in all three immune-related dimensions than the high risk group (Fig.7B). Furthermore, it demonstrated that the risk score generated by the risk model has significant positive correlation with tumor stem cell index (RNAss) (Fig.7C). Finally, the sensitivity of some drugs with therapeutic potentialities were investigated in the two risk groups. It was found that some drugs such as Nilotinib, Bortezomib and Dasatinib has significant different IC50 between the low and high-risk group, and high-risk group has lower IC50 (Fig.8).
Discussion
After developed chronic fibrotic liver disease which caused by viral or metabolic aetiologies, patients tend to develop HCC[13]. However, the key question is how we can reliably estimate the HCC risk and diagnose the early stage of HCC. Unfortunately, the robust estimate system hasn’t been established and many patients suffered from HCC severely[14]. To establish a precise prognosis model which can solve the urgent need well, we integrate the muti-omics analysis and the progress of pyroptosis. The pyroptosis was a novel form of cell death, and was discovered to have pivotal roles in oncogenesis, immune cells infiltration and antitumor response[15]. In this study, we collected 52 pyroptosis-related genes and found most of them have different expression level between HCC samples and the matched normal tissues. This indicated that pyroptosis has important functions in HCC. Furthermore, we explore the protein-protein interactions and mutual regulatory relations of the 42 DEGs. It showed that GSDMD, CASP8, PYCARD and some CHMP family genes were the core interaction genes. And most of the interacted genes were positive regulated each other. In the aspect of TMB and CNV, the TP53 was the most mutated gene, and the main pattern is the missense mutation. We also found that most of the DEGs has CNV in HCC simples, and GSDMC, AIM2, GADMD, CHMP6 have high gain variations, while CASP9 has high loss variations. 52 prognostic genes were identified using Cox analysis, and the HCC patients could be divided into two clusters using consensus clustering analysis based on the expression level of prognostic genes. It is noteworthy that the two clusters have significant difference both in OS rate and the clinical feature. This differs from other clusters in preparing models for HCC, which have no significant differences in clinical features[16, 17]. This indicates that our method for building model has more accuracy and significance in predicting the prognosis of HCC patients. Moreover, we integrated the multiple omics analysis to demonstrate the landscape of our gene signature and prognosis model in various molecule layers.
Subsequently, we performed the LASSO and Cox analysis based on prognosis genes, and establish a 22 genes signature prognosis model. As expected, this model can separate the different risk patients well and two patient subtypes divided by the model has significant OS rate difference. Moreover, the AOCs of ROC analysis for several years are about 0.8. These all together confirmed that our model is reliable and precious for predicting the HCC patients’ prognosis. Importantly, the model was well validated in the internal test and external validation cohorts. Furthermore, the univariate and multivariate Cox regression analysis all showed that the riskscore generated by the model can serve as an independent prognosis factor. To explore the potential biological functions and pathways which contribute to the risk of developing HCC, we performed GO and KEGG enrichment analysis. We found that material metabolism and molecule secretion could be the mechanism for developing HCC. Recently, Zheng et al. has depicted the landscape of tumor-infiltrating T cells in pan-cancer, revealing the heterogeneity of T cells in cancer[18]. Thus we further evaluate the relevance of infiltrated immune cells and the model genes. The GZMH was found to have strong correlation with CD4 and CD8 T cells, which could provide new sight for further study. In the aspect of TME, the low risk group defined by our model has higher immune related score, indicating that the low risk patients could be benefited from better immune state[19]. We also clarified that the index of tumor stem cell is raised with the increase of the risk. This result demonstrate that the content of tumor stem cell could be a risk factor of HCC, and in turn verified the robust of our model[20]. Finally, we screened some drugs which has different sensitivity in treating the two risk groups. And the IC50 of screened drugs are significantly lower in the high risk group, illustrating these drugs have higher sensitivity in the high risk group than the low one.
In conclusion, this was the first study to comprehensively investigate the role of pyroptosis in HCC with multiple omics analysis. We established a robust and acute prognostic model for HCC. Compared with other published models, our model showed distinct advantage in multiple aspects: We are the first to integrate the multi-omics analysis to establish pyroptosis-related model; We collected more pyroptosis genes and identified more prognostic genes for building model; The subtypes have showed significant difference in clinical profiles before and after the model built. All the findings in our study provide a comprehensive landscape of molecule heterogeneity in HCC based on pyroptosis and facilitate the precise management of HCC patients.
Author Contributions
JHH carried out experiments, analyzed the data, drew the pictures and wrote the manuscript. YDW and MC conceptualized and designed this study. All authors have revised and agreed to publish this manuscript.
Competing Interests
The authors declare no competing interests.
Acknowledgements
National Natural Science Foundation of China (81603119) and Natural Science Foundation of Beijing Municipality (7174316). Major science and technology projects (2018ZX09303047).
Footnotes
Competing Interests statement: The authors declare no competing interests.